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I, L. MICHAEL FURNESS, a citizen of the United Kingdom, residing at 2 
Brookside, Exning, Newmarket, United Kingdom, declare that: 

1 . I was employed by Incyte Genomics, Inc. (hereinafter "Incyte") as a Director 
of Pharmacogenomics until December 3 1 , 2001 . I am currently under contract to be a Consultant to 
Incyte Genomics, Inc. 

2. In 1984, I received a B.Sc.(Hons) in Biomolecular Science (Biophysics and 
Biochemistry) from Portsmouth Polytechnic. 

From 1985-1987 I was at the School of Pharmacy in London, United Kingdom, 
during which time I analyzed lipid methyltransferase enzymes using a variety of protein analysis 
methods, including one-dimensional (ID) and two-dimensional (2D) gel electrophoresis, HPLC, 
and a variety of enzymatic assay systems. 

I then worked in the Protein Structure group at the National Institute for Medical 
Research until 1 989, setting up core facilities for nucleic acid synthesis and sequencing, as well as 
assisting in programs on protein kinase C inhibitors. 
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After a year at Perkin Elmer-Applied Biosystems as a technical specialist, I worked 
at the Imperial Cancer Research Fund between 1990-1992, on a Eureka-funded program 
collaborating with Amersham Pharmacia in the United Kingdom and CEPH (Centre d'Etude du 
Polymorphisme Humaine) in Paris, France, to develop novel nucleic acid purification and 
characterization methods. 

In 1992, 1 moved to Pfizer Central Research in the United Kingdom, where I stayed 
until 1998, initially setting up core DNA sequencing and then a DNA arraying facility for gene 
expression analysis in 1993. My work also included bioinformatics and I was responsible for the 
support of all Pfizer neuroscience programs in the United Kingdom. This then led me into carrying 
out detailed bioinformatics and wet lab work on the sodium channels, including antibody 
generation, western and northern analyses, PCR, tissue distribution studies, and sequence analyses 
on novel sequences identified. 

In 1998 I moved to Incyte Genomics, Inc., in the Pharmacogenomics group, looking 
at the application of genomics and proteomics to the pharmaceutical industry. In 1999 I was 
appointed director of the LifeExpress Lead Program which used microarray and protein expression 
data to identify pharmacologically and toxicologically relevant mechanisms to assist in improved 
drug design and development. 

On December 12, 2001 I founded Nuomics Consulting Ltd., in Exning, U.K., and I 
am currently employed as Managing Director. Nuomics Consulting Ltd. will be providing expert 
technical knowledge and advice to businesses around the areas of genomics, proteomics, 
pharmacogenomics, toxicogenomics and chemogenomics. 

3. I have reviewed the specification of a United States patent application that I 
understand was filed on April 20, 2000 in the names of Olga Bandman et al. and was assigned 
Serial No. 09/556,1 78 (hereinafter "the Bandman '178 application"). Furthermore, I understand 
that this United States patent application was a divisional application of and claimed priority to 
United States patent application Serial No. 09/368,408 filed on August 4, 1999, which was itself a 
divisional application of and claimed priority to United States patent application Serial No. 
08/967,364 filed on November 7, 1997 (hereinafter "the Bandman c 364 application") having 
essentially the identical specification, with the exception of corrected typographical errors and 
reformatting. Thus page and line numbers may not match as between the Bandman 6 178 
application and the Bandman 6 364 application. My remarks herein will therefore be directed to the 
Bandman '364 patent application, and November 7, 1 997, as the relevant date of filing. In broad 
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overview, the Bandman 4 364 specification pertains to certain nucleotide and amino acid sequences 
and their use in a number of applications, including gene and protein expression monitoring 
applications that are useful in connection with (a) developing drugs (e.g., for the treatment of 
cancer), and (b) monitoring the activity of drugs for purposes relating to evaluating their efficacy 
and toxicity. 

4. I understand that (a) the Bandman * 178 application contains claims that are 
directed to an isolated polypeptide having the sequence shown as SEQ ID NO:l (hereinafter "the 
SEQ ID NO:l polypeptide"), and (b) the Patent Examiner has rejected those claims on the grounds 
that the specification of the Bandman ' 1 78 application does not disclose a substantial, specific and 
credible utility for the claimed SEQ ID NO:l polypeptide. I further understand that whether or not 
a patent specification discloses a substantial, specific and credible utility for its claimed subject 
matter is properly determined from the perspective of a person skilled in the art to which the 
specification pertains at the time of the patent application was filed. In addition, I understand that a 
substantial, specific and credible utility under the patent laws must be a "real-world" utility. 

5. I have been asked (a) to consider with a view to reaching a conclusion (or 
conclusions) as to whether or not I agree with the Patent Examiner's position that the Bandman ' 1 78 
application and its parent, the Bandman 6 364 application, do not disclose a substantial, specific and 
credible "real-world" utility for the claimed SEQ ID NO:l polypeptide, and (b) to state and explain 
the bases for any conclusions I reach. I have been informed that, in connection with my 
considerations, I should determine whether or not a person skilled in the art to which the Bandman 
'364 application pertains on November 7, 1997, would have concluded that the Bandman '364 
application disclosed, for the benefit of the public, a specific beneficial use of the SEQ ID NO:l 
polypeptide in its then available and disclosed form. I have also been informed that, with respect to 
the "real-world" utility requirement, the Patent and Trademark Office instructs its Patent Examiners 
in Section 2107.01 of the Manual of Patent Examining Procedure, 8 th Edition, August 2001, under 
the heading I. Specific and Substantial Requirements, Research Tools: 

Many research tools such as gas chromatographs, screening assays, and nucleotide 
sequencing techniques have a clear, specific and unquestionable utility (e.g., they are 
useful in analyzing compounds). An assessment that focuses on whether an 
invention is useful only in a research setting thus does not address whether the 
specific invention is in fact "useful" in a patent sense. Instead, Office personnel 
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must distinguish between inventions that have a specifically identified substantial 
utility and inventions whose asserted utility requires further research to identify or 
reasonably confirm. 

6. I have considered the matters set forth in paragraph 5 of this Declaration and 
have concluded that, contrary to the position I understand the Patent Examiner has taken, the 
specification of the Bandman '364 patent application disclosed to a person skilled in the art at the 
time of its filing a number of substantial, specific and credible real-world utilities for the claimed 
SEQ ID NO:l polypeptide. More specifically, persons skilled in the art on November 7, 1997 
would have understood the Bandman '364 application to disclose the use of the SEQ ID NO:l 
polypeptide as a research tool in a number of gene and protein expression monitoring applications 
that were well-known at that time to be useful in connection with the development of drugs and the 
monitoring of the activity of such drugs. I explain the bases for reaching my conclusion in this 
regard in paragraphs 7-13 below. 

7. In reaching the conclusion stated in paragraph 6 of this Declaration, I 
considered (a) the specification of the Bandman ' 364 application, and (b) a number of published 
articles and patent documents that evidence gene and protein expression monitoring techniques that 
were well-known before the November 7, 1997 filing date of the Bandman '364 application. The 
published articles and patent documents I considered are: 

(a) Anderson, N.L., Esquer-Blasco, R. 5 Hofmann, J.-P., Anderson, N.G., 
A Two-Dimensional Gel Database of Rat Liver Proteins Useful in Gene Regulation and Drug 
Effects Studies , Electrophoresis, 12, 907-930 (1991) (hereinafter "the Anderson 1991 article") 
(copy annexed at Tab A); 

(b) Anderson, N.L., Esquer-Blasco, R., Hofmann, J.-P., Mehues, L., 
Raymackers, J., Steiner, S. Witzmann, F., Anderson, N.G., An Updated Two-Dimensional Gel 
Database of Rat Liver Proteins Useful in Gene Regulation and Drug Effect Studies, Electrophoresis, 
16, 1977-1981 (1995) (hereinafter 'the Anderson 1995 article") (copy annexed at Tab B); 

(c) Wilkins, M.R., Sanchez, J.-C, Gooley, A. A., Appel, R.D., 
Humphery-Smith, L, Hochstrasser, D.F., Williams, K.L., Progress with Proteome Projects: Why all 
Proteins Expressed by a Genome Should be Identified and How To Do It , Biotechnology and 
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Genetic Engineering Reviews, 13, 19-50 (1995) (hereinafter "the Wilkins article") (copy annexed at 
Tab C); 

(d) Celis, J.E., Rasmussen, H.H., Leffers, H., Madsen, P., Honore, B., 
Gesser, B., Dejgaard, K., Vandekerckhove, J., Human Cellular Protein Patterns and their Link to 
Genome DNA Sequence Data: Usefulness of Two-Dimensional Gel Electrophoresis and 
Microsequencing. FASEB Journal, 5, 2200-2208 (1991) (hereinafter "the Celis article") (copy 
annexed at Tab D); 

(e) Franzen, B. 5 Linder, S., Okuzawa, K., Kato, H., Auer, G., 
Nonenzymatic Extraction of Cells from Clinical Tumor Material for Analysis of Gene Expression 
by Two-Dimensional Polyacrylamide Gel Electrophoresis , Electrophoresis, 14, 1045-1053 (1993) 
(hereinafter "the Franzen article") (copy annexed at Tab E); 

(f) Bjellqvist, B., Basse, B., Olsen, E. ? Celis, J.E., Reference Points for 
Comparisons of Two-Dimensional Maps of Proteins from Different Human Cell Types Defined in a 
pH Scale Where Isoelectric Points Correlate with Polypeptide Compositions , Electrophoresis, 15, 
529-539 (1994) (hereinafter "the Bjellqvist article") (copy annexed at Tab F); 

(g) Large Scale Biology Company Info; LSB and LSP Information; from 
http://www.lsbc.com (2001) (copy annexed at Tab G); and 

(h) Pevsner, J., Hsu, S.-C, Hyde, P.S., and Scheller, R.H., Mammalian 
homologues of yeast vacuolar protein sorting (vps) genes implicated in Golgi-to-lvsosome 
trafficking . Gene, 183, 7-14 (1996) (hereinafter "the Pevsner article") (copy annexed at Tab H). 

8. Many of the published articles I considered (i.e., at least items (a)-(f) 
identified in paragraph 7) relate to the development of protein two-dimensional gel electrophoretic 
techniques for use in gene expression monitoring applications in drug development and toxicology. 
As I will discuss below, a person skilled in the art who read the Bandman '364 application on 
November 7, 1997 would have understood that application to disclose the SEQ ID NO:l 
polypeptide to be useful for a number of gene and protein expression monitoring applications, e.g., 
in the use of two-dimensional polyacrylamide gel electrophoresis and western blot analysis of tissue 
samples in drug development and in toxicity testing. 

9. Turning more specifically to the Bandman '364 specification, the SEQ ID 
NO:l polypeptide is shown at pages 57-59 as one of nine sequences under the heading "Sequence 
Listing." The Bandman '364 specification specifically teaches that the "invention features a 
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substantially purified human vesicle trafficking protein (VTP) comprising an amino acid sequence 
selected from the group consisting of SEQ ID NO:l, SEQ ID NO:3, and SEQ ID NO:5." (Bandman 
'364 application at page 2, lines 23-25.). It further teaches that (a) the identity of the SEQ ID NO:l 
polypeptide was determined from a "THP-1 cell line cDNA library (THP1PEB01)," (b) the SEQ ID 
NO:l polypeptide is the vesicle trafficking protein referred to as "VTP-1" and is encoded by SEQ 
ID NO:2, and (c) northern analysis shows 'the expression of VTP-1 in various cDNA libraries, at 
least 42% of which are immortalized or cancerous, at least 24% of which involve immune response, 
and at least 29% are expressed in fetal/infant tissues or organs" and therefore VTP-1 appears to play 
a role in inflammation and disorders associated with cell proliferation and apoptosis. (Bandman 
'364 application at p. 14, lines 5-29.) 

The Bandman '364 application discusses a number of uses of the SEQ ID NO: 1 
polypeptide in addition to its use in gene expression monitoring applications. I have not fully 
evaluated these additional uses in connection with the preparation of this Declaration and do not 
express any views in this Declaration regarding whether or not the Bandman 6 364 specification 
discloses these additional uses to be substantial, specific and credible real-world utilities of the SEQ 
ID NO:l polypeptide. Consequently, my discussion in this Declaration concerning the Bandman 
'364 application focuses on the portions of the application that relate to the use of the SEQ ID NO:l 
polypeptide in gene and protein expression monitoring applications. 

10. The Bandman '364 application discloses that the polynucleotide sequences 
disclosed therein, including the polynucleotides encoding the SEQ ID NO:l polypeptide, are useful 
as probes in chip based technologies. They further teach that the chip based technologies can be 
used "for the detection and/or quantification of nucleic acid or protein" (Bandman '364 application 
at p. 25, lines 21-23). 

The Bandman '364 application also discloses that the SEQ ID NO:l polypeptide is 
useful in other protein expression detection technologies. The Bandman '364 application states that 
"[a] variety of protocols for detecting and measuring the expression of VTP, using either polyclonal 
or monoclonal antibodies specific for the protein are known in the art. Examples include 
enzyme-linked immunosorbent assay (ELISA), radioimmunoassay (RIA), and fluorescence 
activated cell sorting (FACS)." (Bandman '364 application at page 25, line 29 through page 26, line 
2.) Furthermore, the Bandman '364 application discloses that "[a] variety of protocols including 
ELISA, RIA, and FACS for measuring VTP are known in the art and provide a basis for diagnosing 
altered or abnormal levels of VTP expression. Normal or standard values for VTP expression are 
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established by combining body fluids or cell extracts taken from normal mammalian subjects, 
preferably human, with antibody to VTP under conditions suitable for complex formation." 
(Bandman '364 application at page 38 5 lines 21-25.) 

In addition, at the time of filing the Bandman '364 application, it was well known in 
the art that "gene" and protein expression analyses also included two-dimensional polyacrylamide 
gel electrophoresis (2-D PAGE) technologies, which were developed during the 1980s, and as 
exemplified by the Anderson 1991 and 1995 articles (Tab A and Tab B). The Anderson 1991 
article teaches that a 2-D PAGE map has been used to connect and compare hundreds of 2-D gels of 
rat liver samples from a variety of studies including regulation of protein expression by various 
drugs and toxic agents (Tab A at p. 907). The Anderson 1991 article teaches an empirically- 
determined standard curve fitted to a series of identified proteins based upon amino acid chain 
length (Tab A at p. 91 1) and how that standard curve can be used in protein expression analysis. 
The Anderson 1991 article teaches that "there is a long-term need for a comprehensive database of 
liver proteins" (Tab A at p. 912). 

The Wilkins article is one of a number of documents that were published prior to the 
November 7, 1997 filing date of the Bandman 6 364 application that describes the use of the 2-D 
PAGE technology in a wide range of gene and protein expression monitoring applications, 
including monitoring and analyzing protein expression patterns in human cancer, human serum 
plasma proteins, and in rodent liver following exposure to toxins. In view of the Bandman '364 
application, the Wilkins article, and other related pre-November 7, 1997 publications, persons 
skilled in the art on November 7, 1997 clearly would have understood the Bandman '364 
application to disclose the SEQ ID NO:l polypeptide to be useful in 2-D PAGE analyses for the 
development of new drugs and monitoring the activities of drugs for such purposes as evaluating 
their efficacy and toxicity, as explained more fully in paragraph 12 below. 

With specific reference to toxicity evaluations, those of skill in the art who were 
working on drug development in November 7, 1997 (and for many years prior to November 7, 
1997) without any doubt appreciated that the toxicity (or lack of toxicity) of any proposed drug they 
were working on was one of the most important criteria to be considered and evaluated in 
connection with the development of the drug. They would have understood at that time that good 
drugs are not only potent, they are specific. This means that they have strong effects on a specific 
biological target and minimal effects on all other biological targets. Ascertaining that a candidate 
drug affects its intended target, and identification of undesirable secondary effects (i.e., toxic side 
effects), had been for many years among the main challenges in developing new drugs. The ability 
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to determine which genes are positively affected by a given drug, coupled with the ability to quickly 
and at the earliest time possible in the drug development process identify drugs that are likely to be 
toxic because of their undesirable secondary effects, have enormous value in improving the 
efficiency of the drug discovery process, and are an important and essential part of the development 
of any new drug. In fact, the desire to identify and understand toxicological effects using the 
experimental assays described above led Dr Leigh Anderson to found the Large Scale Biology 
Corporation in 1985, in order to pursue commercial development of the 2-D electrophoretic protein 
mapping technology he had developed. In addition, the company focused on toxicological effects 
on the proteome as clearly demonstrated by its goals and by its senior management credentials 
described in company documents (see Tab G at pp. 1, 3, and 5). 

Accordingly, the teachings in the Bandman '364 application, in particular regarding 
use of SEQ ID NO:l in differential gene and protein expression analysis (2-D PAGE maps) and in 
the development and the monitoring of the activities of drugs, clearly includes toxicity studies and 
persons skilled in the art who read the Bandman 6 364 application on November 7, 1997 would have 
understood that to be so. 

11. As previously discussed (supra, paragraphs 7 and 8), my experience with 
protein analysis methods in the mid-1980s and the several publications annexed to this Declaration 
at Tabs A through F evidence information that was available to the public regarding two- 
dimensional polyacrylamide gel electrophoresis technology and its uses in drug discovery and 
toxicology testing before the November 7, 1997 filing date of the Bandman '364 application. In 
particular the Celis article stated that "protein databases are expected to foster a variety of biological 

information.... — among others, drug development and testing" (See Tab D, p. 2200, second 

column). The Franzen article shows that 2-D PAGE maps were used to identify proteins in clinical 
tumor material (See Tab E). The Bandman '364 application clearly discloses that VTP-1 is 
expressed "in various cDNA libraries, at least 42% of which are immortalized or cancerous, at least 
24% of which involve immune response, and at least 29% are expressed in fetal/infant tissues or 
organs." (Bandman '364 application at page 14, lines 27-29.) The Bjellqvist article showed that a 
protein may be identified accurately by its positional co-ordinates, namely molecular mass and 
isoelectric point (See Tab F). The Bandman '364 application clearly disclosed SEQ ID NO:l from 
which it would have been routine for one of skill in the art to predict both the molecular mass and 
the isoelectric point using algorithms well known in the art at the time of filing. 
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12. A person skilled in the art on November 7, 1997, who read the Bandman ' 364 
application, would understand that application to disclose the SEQ ID NO: 1 polypeptide to be 
highly useful in analysis of differential expression of proteins. For example, the specification of the 
Bandman '364 application would have led a person skilled in the art in November 7, 1997 who was 
using protein expression monitoring in connection with working on developing new drugs for the 
treatment of inflammation and disorders associated with cell proliferation and apoptosis to conclude 
that a 2-D PAGE map that used the isolated SEQ ID NO:l polypeptide would be a highly useful 
tool and to request specifically that any 2-D PAGE map that was being used for such purposes 
utilize the SEQ ID NO:l polypeptide sequence. Expressed proteins are useful for 2-D PAGE 
analysis in toxicology expression studies for a variety of reasons, particularly for purposes relating 
to providing controls for the 2-D PAGE analysis, and for identifying sequence or post-translational 
variants of the expressed sequences in response to exogenous compounds. Persons skilled in the art 
would appreciate that a 2-D PAGE map that utilized the SEQ ID NO:l polypeptide sequence would 
be a more useful tool than a 2-D PAGE map that did not utilize this protein sequence in connection 
with conducting protein expression monitoring studies on proposed (or actual) drugs for treating 
inflammation and disorders associated with cell proliferation and apoptosis for such purposes as 
evaluating their efficacy and toxicity. 

I discuss in more detail in items (a)-(c) below a number of reasons why a person 
skilled in the art, who read the Bandman '364 specification in November 7, 1997, would have 
concluded based on that specification and the state of the art at that time, that SEQ ID NO: 1 
polypeptide would be a highly useful tool for analysis of a 2-D PAGE map for evaluating the 
efficacy and toxicity of proposed drugs for inflammation and disorders associated with cell 
proliferation and apoptosis by means of 2-D PAGE maps, as well as for other evaluations: 

(a) The Bandman '364 specification contains a number of teachings that 
would lead persons skilled in the art on November 7, 1997 to conclude that a 2-D PAGE map that 
utilized the substantially purified SEQ ID NO:l polypeptide would be a more useful tool for gene 
expression monitoring applications relating to drugs for treating inflammation and disorders 
associated with cell proliferation and apoptosis than a 2-D PAGE map that did not use the SEQ ID 
NO:l polypeptide sequence. Among other things, the Bandman '364 specification teaches that (i) 
the identity of the SEQ ID NO:l polypeptide was determined from a THP-1 cell line cDNA library 
(THP1PEB01), (ii) the SEQ ID NO:l polypeptide is the vesicle trafficking protein referred to as 
VTP-1, and (iii) VTP-1 is expressed in "various cDNA libraries, at least 42% of which are 
immortalized or cancerous, at least 24% of which involve immune response, and at least 29% are 
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expressed in fetal/infant tissues or organs" and therefore VTP-1 appears to play a role in 
inflammation and disorders associated with cell proliferation and apoptosis" and, therefore, VTP-1 
appears to be involved in vesicle trafficking, and to play a role in inflammation and disorders 
associated with cell proliferation and apoptosis. (Bandman '364 application at p. 14, lines 5-29; see 
paragraph 9, supra.) The Bandman '364 application teaches that "VTP or a fragment or derivative 
thereof may be administered to a subject to prevent or treat a disorder associated with an increase in 
apoptosis. Such disorders include, but are not limited to . . . neurodegenerative diseases such as 
Alzheimer's disease. . . " (Bandman 6 364 application, page 28, lines 1-4.) The isolated polypeptide 
could therefore be used as a control to more accurately gauge the expression of VTP-1 in the sample 
and consequently more accurately gauge the affect of a toxicant on expression of the gene. 

Moreover, the Bandman "364 specification teaches that SEQ ID NO:l shares 
chemical and structural homology with mouse vacuolar protein-sorting homolog (mVps45) (GI 
1703494). The VTP-1 and mVps45 share 97% sequence homology and have rather similar 
hydrophobicity plots (Bandman '364 application, page 14, lines 23-27, Figures 2A and 2B, and 
Figures 3A and 3B). mVps45 is a mammalian homolog to a yeast protein, Vps45, which "is 
essential for transport from the Golgi to a prevacuolar compartment." (Bandman '364 application, 
page 1, lines 16-17.) The Bandman '364 specification teaches that mammalian homologs to yeast 
vesicle trafficking proteins "are essential in mediating transport among the Golgi complex, synaptic 
vesicles, prelysosomal compartments, and the lysosome." (Bandman '364 specification, page 1, 
lines 24-26.) 

(b) Also pertinent is that pre-November 7, 1997 article points to the potential 
role in Alzheimer's disease for proteins involved in lysosomal trafficking such as mammalian 
homologs (such as a human vps45) of yeast vps genes. 

The Pevsner article (incorporated by reference into the Bandman '364 specification; Tab H) 
states that: 

A description of the proteins involved in lysosomal targeting is essential to 
understand lysosomal function in the biosynthetic and endocytic pathways, and also 
to understand diseases involving lysosomes. Protein trafficking to lysosomes may be 
disrupted in neurodegenerative disorders such as Alzheimer's disease and prion 
encephalopathies (Mayer et al., 1992; Cataldo et al., 1994) as well as organelle 
storage disorders diseases such as Chediak-Higashi syndrome (Zhao et al., 1994). 
(Tab H, page 14) 
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that the protein expression monitoring results obtained using a 2-D PAGE map that utilized a SEQ 
ID NO:l polypeptide would vary, depending on the particular drug being evaluated, and (ii) that 
such varying results would occur both with respect to the results obtained from the SEQ ID NO:l 
polypeptide and from the 2-D PAGE map as a whole (including all its other individual proteins). 
These kinds of varying results, depending on the identity of the drug being tested, in no way 
detracts from my conclusion that persons skilled in the art on November 7, 1997, having read the 
Bandman '364 specification, would specifically request that any 2-D PAGE map that was being 
used for conducting protein expression monitoring studies on drugs for treating inflammation and 
disorders associated with cell proliferation and apoptosis {e.g., a toxicology study or any efficacy 
study of the type that typically takes place in connection with the development of a drug) utilize the 
SEQ ID NO:l polypeptide sequence. Persons skilled in the art on November 7, 1997 would have 
wanted their 2-D PAGE map to utilize the SEQ ID NO:l polypeptide sequence because a 2-D 
PAGE map that utilized protein sequence information the polypeptide (as compared to one that did 
not) would provide more useful results in the kind of gene expression monitoring studies using 2-D 
PAGE maps that persons skilled in the art have been doing since well prior to November 7, 1997. 

The foregoing is not intended to be an all-inclusive explanation of all my reasons for 
reaching the conclusions stated in this paragraph 1 2, and in paragraph 6, supra. In my view, 
however, it provides more than sufficient reasons to justify my conclusions stated in paragraph 6 of 
this Declaration regarding the Bandman 4 364 application disclosing to persons skilled in the art at 
the time of its filing substantial, specific and credible real-world utilities for the SEQ ID NO:l 
polypeptide. 

13. Also pertinent to my considerations underlying this Declaration is the fact 
that the Bandman '364 disclosure regarding the uses of the SEQ ID NO:l polypeptide for protein 
expression monitoring applications is not limited to the use of that protein in 2-D PAGE maps. For 
one thing, the Bandman '364 disclosure regarding the technique used in gene and protein expression 
monitoring applications is broad. (Bandman '364 application at, e.g., page 25, lines 19-23 and page 
38, lines 21-29.) 

In addition, the Bandman '364 specification repeatedly teaches that the protein 
described therein (including the SEQ ID NO:l polypeptide) may desirably be used in any of a 
number of long established "standard" techniques, such as ELISA or western blot analysis, for 
conducting protein expression monitoring studies. See, e.g.: 

(a) Bandman '364 application at page 25, line 29 through page 26, line 2 ("A 
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variety of protocols for detecting and measuring the expression of VTP, using either polyclonal or 
monoclonal antibodies specific for the protein are known in the art. Examples include 
enzyme-linked immunosorbent assay (ELISA), radioimmunoassay (RIA), and fluorescence 
activated cell sorting (FACS)."); and 

(b) Bandman '364 application at page 38, lines 21-29 ("A variety of protocols 
including ELISA, RIA, and FACS for measuring VTP are known in the art and provide a basis for 
diagnosing altered or abnormal levels of VTP expression. Normal or standard values for VTP 
expression are established by combining body fluids or cell extracts taken from normal mammalian 
subjects, preferably human, with antibody to VTP under conditions suitable for complex formation 
The amount of standard complex formation may be quantified by various methods, but preferably 
by photometric, means. Quantities of VTP expressed in subject, control and disease, samples from 
biopsied tissues are compared with the standard values. Deviation between standard and subject 
values establishes the parameters for diagnosing disease."). 

Thus a person skilled in the art on November 7, 1997, who read the Bandman '364 
specification, would have routinely and readily appreciated that the SEQ ID NO:l polypeptide 
disclosed therein would be useful to conduct gene expression monitoring analyses using 2-D PAGE 
mapping or western blot analysis or any of the other traditional membrane-based protein expression 
monitoring techniques that were known and in common use many years prior to the filing of the 
Bandman '364 application. For example, a person skilled in the art in November 7, 1997 would 
have routinely and readily appreciated that the SEQ ID NO:l polypeptide would be a useful tool in 
conducting protein expression analyses, using the 2-D PAGE mapping or western analysis 
techniques, in furtherance of (a) the development of drugs for the treatment of inflammation and 
disorders associated with cell proliferation and apoptosis, and (b) analyses of the efficacy and 
toxicity of such drugs. 
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14. I declare further that all statements made herein of my own knowledge are 
true and that all statements made herein on information and belief are believed to be true; and 
further, that these statements were made with the knowledge that willful false statements and the 
like so made are punishable by fine or imprisonment, or both, and that willful false statements may 
jeopardize the validity of this application and any patent issuing thereon. 




L. Michael Furness, B.Sc. 



Signed at Exning, United Kingdom 
this 10th day of January 2002 
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Dmbut of rat liver proteins 



A two-dimensional gel database of rat liver proteins 
useful in gene regulation and drug effects studies 

A standard two-dimensionaJ (2*D) protein map or Fischer 344 rat liver 
(F344MST3) is presented, with a tabular listing of more than 1200 protein species. 
Sodium dodecyl sulfate (SDS) molecular mass and isoelectric point have been es- 
tablished, based on positions of numerous internal standards. This map has been 
used to connect and compare hundreds of 2*D gels of rai liver samples from a va- 
riety of studies, and forms the nucleus of an expanding database describing rat 
liver proteins and their regulation by various drugs and toxic agents. An example 
of such a study, involving regulation of cholesterol synthesis by cholesterol-lower- 
ing .drugs and a high-cholesterol diet, is presented. Since the map has been ob- 
tained with a widely used and highly reproducible 2-D gel system (the Iso-Dalt* 
system), it can be directly related to an expanding body of work in other laborato- 
ries. 
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1 Introduction 

High-resolution two-dimensional electrophoresis of pro* 
teins, introduced in 1975 by OTarreU and others [1— 4], has 
been used over the ensuing 16 years to examine a wide va- 
riety of biological systems, the results appearing in more 
than 5000 published papers. With the advent of computer* 
ized systems for analyzing two-dimensional (2-D) gel ima- 
ges and constructing spot databases, it is also possible to 
plan and assemble integrated bodies of information de- 
scribing the appearance and regulation of thousands of pro- 
tein gene products [5, 6]. Creating such databases involves 
amassing and organizing quantitative data from thousands 
of 2-D gels, and requires a substantial commitment in tech* 
noiogy and resources. 

Given the long-term effort required to develop a protein da- 
tabase, the choice of a biological system takes on consider* 
able importance. While in vitro systems are ideal for answer- 
ing many experimental questions, especially in cancer re- 
search and genetics, our experience with cell cultures and 
tissue samples suggests that some in vivo approaches could 
have major advantages. In particular, we have noticed that 
liver tissue samples from rats and mice appear to show grea- 
ter quantitative reproducibility (in terms of individual pro- 
tein expression) than replicate cell cultures. This is perhaps 
a natural result of the homeostasis maintained in a com- 
plete animal vs. the well-known variability of cell cultures, 
the latter due principally to differences in reagents (e.g.. 
fetal bovine serum). conditions (e.y..pH)and genetic"evo- 
lution'of cell lines while in culture. It is also more difficult 
to generate adequate amounts of protein from cell culture 
systems (particularly with attached cells), forcing the inves- 
tigator to resort to radioisotope-based or silver-based stain- 
detection methods. While these methods are more sensi- 
tive (sometimes much more sensitive) than the Coomassie 
Brilliant Blue (CBB) stain typically used for protein detec- 
tion in "large" protein samples, they are generally more vari- 
able, more labor-intensive and. in the case of radiographic 
methods, may generate highly "noisy" images, due to the 
properties of the films used. By contrast, large protein sam- 
ples can easily be prepared from liver using urea/Nonidet 
P-40 (NP-40) solubilization and stained with CBB, which 
has the advantage of being easily reproducible [8). Finally, 
there remains the question of the "truthfulness* of many in 
vitro systems as compared to their in vivo analogs; how 
great are the changes caused by the introduction into a cul- 

017V0SJ3/9I/IIM.O907 SJ. SO* .23/0 
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aire and the associated shift 10 strong selection for growth, 
and how do these aflea experimental outcomes? Hence 
the apparent advantages of in vitro systems, in terms of ex- 
perimental manipulation, may be counterbalanced by 
other factors relating to 2-D data quality. 

There is a second important class of reasons for exploring 
the use of an in vivo biological system such as the liver. His- 
torically, there have been ru o broad approaches to the me- 
chanistic dissection of biochemical processes in intact cel- 
lular systems: genetics (a search for informative mutants) 
and the use of chemical agents (drugs and chemical toxins). 
Both approaches help us to understand complex systems 
by disrupting some specific functional element and show- 
ing us the result. With the development of techniques for 
genetic manipulation and cloning, the genetic approach 
can be effectively applied either in vitro or in vivo, although 
the in vitro route is usually quicker. The chemical approach 
can also be applied to either son of biological system; here, 
however, the bulk of consistently acquired information is 
in experimental animals (rats and mice ). While most biolo- 
gists know a short list of compounds having specific, experi- 
mentally useful effects inhibitors of protein synthesis, 
ionophores. polymerase inhibitors, channel blockers, nu- 
cleotide analogs, and compounds affecting polymerization 
of cytoskeletal proteins), there is a much larger number of 
interesting chemically-induced effects, most of them char- 
acterized by toxicologisis and pharmacologists in rodent 
systems. Just as a thorough genetic analysis would involve 
saturating a genome with mutations, it is possible to ima- 
gine a saturating number of drugs, the analysis of whose ac- 
tions would reveal the complete biochemistry of the cell. 
While organized drug discovery efforts usually target spe- 
cific desired effects, the nature of the process, with its de- 
pendence on screening large numbers of compounds, ne- 
cessarily produces many unanticipated effects. It is there- 
fore reasonable to suppose that the required broad range of 
compounds necessary to achieve "biochemical saturation" 
may be forthcoming; in fact, it may already exist among the 
hundreds of thousands of compounds that failed to qualify 
as drugs. 

Among organs, the liver is an obvious choice for the study 
of chemical effects because of its well-known plasticity and 
responsiveness. The brain appears to be quite plastic (e.g. 
[7]), but it is a complicated mixture of cell types requiring 
skillful dissection for most experiments. The kidney, while 
quite responsive, also presents a potentially confounding 
mixture of cell types. The liver, by contrast, is made up of 
one predominant cell type which is easy to solubilize: the 
hepatocyte, representing more than 95% of its mass. Most 
importantly, the liver performs many homeostatic func- 
tions that require rapid modulation of gene expression. It 
appears that most chemical agents tested affect gene ex- 
pression in the liver at some dosage (N. Leigh Anderson, 
unpublished observations), an interesting contrast to our 
earlier work with lymphocytes, for example, which seem to 
be much less responsive.Such results conform to the expec- 
tation that cells with a homeostatic, physiological role 
should be more plastic than cells differentiated for a pur- 
pose dependent on the-anion of a limited number of spe- 
cific genes. 

The liver also allows the parallels between in vitro and in 
vivo systems to be examined in detail. Significant progress 



has been made in the development of mous:. m- a 
man hepatocyte culture systems. as well as in precisi^ K 
tissue slices. Using such an array of techniques, it !S 
ble to assemble a matrix of mammalian systems inclu °? 
mouse and rat in vivo on one level and mouse, rat and h * 
man in vitro on a second level, and to compare efTecu • 
tween species and between systems. This approach alio!^ 
us to draw informed conclusions regarding the biocher- 
"universality* of biological responses among the maniiv 
and to ofTer some insight into the validity of tn vttr^l 
proaches for toxicoiogical screening. We believe this V^ 
will be necessary if in vitro alternatives are to achieve U| ".^ 
usage in government-mandated safety testing of drugs c^- 
sumer products and industrial and agricultural chemuu\ 

A number of interesting studies have been published us-.* 
2-D mapping to examine effects in the rodent liver, a nu-. 
ber of investigarors have made use of the techniqu? • 
screen for existing genetic variants [8-1 1 J or induced mu£. 
lions [12-14). mainly in the mouse. This work builds on irT- 
wealth of genetic information available on the mouse ani 
its established position as a mammalian mutation-dete*. 
tion system. While some studies of chemical effects ha**? 
been undertaken in the mouse [15— 17]. most have used the 
rat [18-23 J. The examination of the cytochrome p-450 s\>. 
tern, in particular, has been carried out almost exclusive!) 
on the rat [24. 25). 

These considerations lead us to conclude that rodent live: 
offers the best opportunity to systematically examine an 
array of gene regulation systems, and ultimately to build a 
predictive model of large-scale mammalian gene control. 
The basic underlying foundation of such a project is a reli- 
able, reproducible master 2-D pattern of liver, to which on- 
going experimental results can be referred. In this paper.*? 
report such a master pattern for the acidic and neutral pro- 
teins of rat liver (pattern F344MST3).ln future, this master 
will be supplemented by maps of basic protems. and analog- 
ous maps of mouse and human liver. 



2 Materials and methods 
2.1 Sample preparation 

Liver is an ideal sample material for most biochemical stud* 
ies, including 2-D analysis. A sample is taken of approxima- 
tely 0.5 g of tissue from the apical end of the left lobe of the 
liver. Solubilization is effected as rapidly as practical: a 
delay of 5— 15 min appears to cause no major alteration in 
liver protein composition if the liver pieces are kept cold 
(e.g., on ice) in the interim. In the solubilization process, 
the liver sample is weighed, placed in a glass homogenize! 
(r.*., 15 mL Wheaton); 8 volumes of solubilizing solution 

• The solubilizing solution is composed of2% NP-40 (Sigma).* 
(analytical grade. r.g.. BDH or Bio-Rad). 0.5% dithiothreitol 
Sigma) and 2% carrier ampholytes <pH 9-11 LKB: these come asa *vj 
stock solution, so 2 % final concentration is achieved by making * h€ *^ 
solution 10% 9-U Ampholine by volume). A large batch of * olubl 
(several hundred mL) is made and stored frozen at -80 °C in aJ ** r 
sufficient to provide enough for one day's estimated sample 



lion requirement. The solution is never allowed to become ^ 
than room temperature at any stage during preparation or thi* 10 * . 



use, since heating of concentrated urea solutions can produce 
nanti that covalently modify proteins producing anifactu* 1 
shifts. Once thawed, any unused solubilizer is discarded. 
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ed (i.e.. 4 mL per Oi g tissue) and the mixture is ho- 
denized using first the loose- and then then the tight-fii- 
£ glass pestle. This takes approximately 5 strokes with 
4 pestle and is earned out at room temperature because 
£ would crystallize out in the cold. Once the liversample 
thoroughly homogenized in the solubiluer. it is assumed 
4 all the proteins are denatured (by the chaotropic effect 
the urea and NP-40 detergent) and the enzymes inacti- 
tcd by the high pH (-9J). Therefore these samples may 
;kept at room temperature until they can be centrifuged 
frozen as a group (within several hours of preparation), 
jc samples are centrifuged for 6 X 1(T gmin (e.g.. 500 000 
j for 12 min using a Beckman TL-100 centrifuge). The 
^trifuge rotor is maintained at just below room tempera- 
je (e.g.. 15-20*C), but not too cold, so as to prevent the 
•capitation of urea.The centrifuge of choice is a Beckman 
LrlOO because of the sample tube sizes available, but any 
:tracentrifuge accepting smallish tubes will suffice. When 
l appropriate centrifuge is not available near the site of 
unple preparation, samples can be frozen at -80 Z C and 
awed prior to cemrifugation and collection of superna- 
tnts.Each supernatant is carefully removed following cen- 
ifugation and aliquoted into at least 4 clean tubes for stor- 
gc-Tnis is done by transferring all the supernatant to one 
lean tube, mixing this gently (to assure homogeneous 
omposition) and then dividing it into 4 aliquots.The ali- 
uots are frozen immediately at -80 C C. These multiple ali- 
uotscan provide insurance against a failed run or a freezer 
rtakdown. 

U:. 

12 Two-dimensional electrophoresis 

Sample proteins are resolved by 2-D electrophoresis using 
£20 X 25 cm Iso-Dalt* 2-D gel system ([26-29]; pro- 
ceed by LSB and by Hoefer Scientific Instruments, San 
^rancisco) operating with 20 gels per batch. All first-dimen- 
iional isoelectric focusing (IEF) gels are prepared using the 
lame single standardized batch of carrier ampholytes 
2PH 4-8A in the present case, selected by LSB's batch- 
ssting program for rat and mouse database work"). A 10 
iL sample of solubilized liver protein is applied to each gel, 
iad the gels are run for 33 000 to 34 500 volt-hours using a 
progressively increasing voltage protocol implemented by 
^programmable high-voltage power supply. An'Ange- 
iique" computer-controlled gradient-casting system (pro- 
duced by LSB) is used to prepare second-dimensional sod- 
ium dodecyl sulfate (SDS) polyacrylamide gradient slab 
£els in which the top 5 % of the gel is 1 1 %T acrylamide, and 
ttfc lower 95% of the gel varies linearly from 11% to 18%T. 

gus system has recently been modified so as to employ a 
commercially available 30.8 %T acTylamide/A\A p -methyle- 
Qebisacrylamide prepared solution (thus avoiding the han- 
ging of the solid acrylamide monomer) and three addi- 
tional stock solutions: buffer (made from Sigma pre-set 
ftis), persulfate and A',A\A\A'-tetramethyleihylenedi- 
.gaine (TEMED). Each gel is identified by a computer- 
ggnted filter paper label polymerized into the lower left cor- 
~~ .of the gel. First-dimensional IEF tube gels are loaded 

$ material (succeeding certified batches of which are available from 
Defer Scientific Instruments) has the most linear pH gradient pro- 
ceed by any ampholyte tested except for the Pharmacia wide range 
*hich has an unacceptable tendency to bind high-molecular weight 
idic proteins, causing them to streak). 



directly (as extruded) onto the slab gels without equilibra- 
tion, and held in place by polyester fabric wedges (Wed- 
gies**, produced by LSB) to avoid the use of hot agarose. 
Second-dimensional slab gels are run overnight, in groups 
of 20, in cooled DALT tanks (10 # C) with buffer circulation. 
All run parameters, reagent source and lot information, 
and notations of deviation from expected results are ente- 
red by the technician responsible on a detailed, multi-page 
record of the experiment. 

13 Staining 

Following SDS-electrophoresis, slab gels are stained for 
protein using a colloidal Coomassie Blue G-250 procedure 
in covered plastic boxes, with 10 gels (totalling approxima- 
tely 1 L of gel) per box. This procedure (based on the work 
of NeuhofT[30,31]) involves fixation in 1.5 L of 50% etha- 
nol and 2% phosphoric acid for 2 h. three 30 min washes, 
each in 2 L of cold tap water, and transfer to 1.5 L of 34% 
methanol, 17% ammonium sulfate and 2 % phosphoric acid 
for 1 h. followed by the addition of a gram of powdered Coo- 
massie Blue G-250 stain. Staining requires approximately 4 
days to read) equilibrium intensity, whereupon gels are 
transferred to cool tap water and their surfaces rinsed to re- 
move any particulate stain prior to scanning. Gels may be 
kept for several months in water with added sodium azide. 
The water washes remove ethanol that would dissolve the 
stain (and render the system noncolloidal. with high back- 
grounds). The concentrated ammonium sulfate and meth- 
anol solution is diluted by equilibration with the water vol- 
ume of the gels to automatically achieve the correct final 
concentrations for colloidal staining. Practical advantages 
of this staining approach can be summarized as follows: (i) 
the low, flat background makes computer evaluation of 
small spots (max OD < 0.02) possible, especially when 
using laser densitometry; (ii) up to 1500 spots can be reli- 
ably detected on many gels (e.g., rat liver) at loadings low 
enough to preserve excellent resolution; and (iii) reprodu- 
cibility appears to be very good: at least several hundred 
spots have coefficients of reproducibility less than 15%. 
This value is at least as good as previous CBB methods, and 
significantly better than many silver stain systems. 

2.4 Positional standardization 

The carbamylated rabbit muscle creatine phosphokinase 
(CPK) standards (32) are purchased from Pharmacia and 
BDH. Amino acid compositions, and numbers of residues 
present in proteins used for internal standardization, are 
taken from the Protein Identification Resource (PIR) se- 
quence database [33]. 



IS Computer analysis 

Stained slab gels are digitized in red light at 134 micron re- 
solution, using either a Molecular Dynamics laser scanner 
(with pixel sampling) or an Eikonix 78/99 CCD scanner. 
Raw digitized gel images are archived on high-density DAT 
tape (or equivalent storage media) and a grayscale video- 
print prepared from the raw digital image as hard-copy 
backup of the gel image. Gels are processed using the Kep- 
ler* software system (produced by LSB), a commercially 
available workstation-based software package built on 
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some of the pmcipla i of the earlier TYCHO system [34- ceuticaJs. ground and mixed with the dictates 
41). Procedure PROC008 is used to yield a spoUist giving of 0.075% and 1 %, respeaivclv The hiA ^ ttmnu ^ 
position, shape and density information for each detected p..«„* «nik*. a ... . * n ™ 0, *«erol h... 



spot. This procedure makes use of digiul filtering, mathe 
maticaJ morphology techniques and digiul masking to re- 
move the background, and uses full 2-D least-squares opti- 
mization to refine the parameters of a 2-D Gaussian shape 
for each spot. Processing parameters and file locations are 
Stored in a relational database, while various log files detail- 
ing operation of the automatic analysis software are ar- 
chived with the reduced data.The computed resolution and 
level of Gaussian convergence of each gel are inspected 
and archived for quality control purposes. 

Experiment packages are constructed using the Kepler ex- 
periment definition database to assemble groups of 2-D 
patterns corresponding to the experimental groups (e.g., 
treated and control animals). Each 2-D pattern is matched 
to the appropriate 'master" 2-D pattern (pattern 
F344MST3 in the case of Fischer 344 rat liver), thereby 
providing linkage to the existing rodent protein 2-D data- 
bases. The software allows experiments containing hun- 
dreds of gels to be constructed and analyzed as a unit, with 
up to 100 gels displayed on the screen at one time for com- 
parative purposes and multiple pages to accommodate ex- 
periments of > 1000 gels. For each treatment, proteins 
showing significant quantitative differences vs. appropriate 
controls are selected using group-wise statistical parame- 
ters (e.g.. Student s t-tesu Kepler* procedure STUDENT). 
Proteins satisfying various quantitative criteria (such as P< 
0.001 difference from appropriate controls) are repre- 
sented as highlighted spots onscreen or on computer-plot- 
ted protein maps and stored as spot populations (i.e., logi- 
cal vectors) in a liver protein database. Quantitative data 
(spot parameters, statistical or other computed values) are 
stored as real-valued vectors in the database. Analysis of co- 
regulation is performed using a Pierson product-moment 
correlation (Kepler procedure CORREL) to determine 
whether groups of proteins are coordinateiy regulated by 
any of the treatments. Such groups can be presented graphi- 
cally on a protein map, and reported together with the statis- 
tical criteria used to assess the level of coregulation. Multi- 
variate statistical analysis (e.g., principal components' ana- 
lysis) is performed on data exported to SAS (SAS Institute). 

2.6 Graphical data out pot 

Graphical results are prepared in GKS and translated 
within Kepler* into output for any of a variety of devices. 
Linedrawing output is typically prepared as Postscript and 
printed on an Apple LaserWriter. Detailed maps presented 
here have been generated using an ultra-high-resolution 
Postscript-compatible Linotronic output device. Greyscale 
graphics are reproduced from the workstation screen using 
a Seikosha videoprinter. Patterns are shown in the standard 
orientation, with high molecular mass at the top and acidic 
proteins to the left. 

2.7 Experiment LSBC04 - 

In the study described here 12-week-old Charles River 
male F344 rats were used. Diets were prepared at LSB, 
based on a Purina 5755M Basal Purified Diet. Lovastatin 
and cholestyramine were obtained as prescription pharma- 



was Purina 5801M-A (5% cholesterol plus 1% [IT° ] dlc: 
late in the control diet). Animal work was carried omT Ch * 
crobiological Associates (Bethesda. MD) Animals u " 
climatized for one week on the control diet fed tec, 
trol diets for one week, and sacrificed on dav g i*'' 0 * 
daily doses of lovastatin and cholestyramine in aoo/on r; 
groups were 37 mg/kg/day and 5 g/kg /day. reSS 
based on the weight of the food consumed Liver L*^' 
were collected and prepared for 2-D electrophoresis ^ 
ing to the standard liver protocol (homogenization?/! 
volumes of 9 m urea. 2% NP-40. 0.5% dithiotS V 
LKB P H 9-1 1 carrier ampholytes, followed I bv cen S ' 
tion for 30 min at 80000 X ,). Kidney. jJtS^Z 
samples were frozen. Gels were run as described £2 
and the data was analyzed using the Kepler* svstem fi 
were scaled, to remove the effect of difference's in proie n 
loading, by setting the summed abundances of a lane nl 
ber of matched spots equal for each gel (linear scaling) 

3 Results and discussion 

3.1 The rat liver protein 2-D map 

F344MST3 is a standard 2-D pattern of rat liver proteins 
based on the Fischer 344 strain. This pattern was initiated 
from a single 2-D gel and extensively edited in an experi- 
ment comparing it to a range of protein loads, so as to in- 
clude both small spots and well-resolved representations of 
high-abundance spots. More than 700 rat liver 2-D patterns 
have been matched to F344MST3 in a series of drug effects 
and protein characterization experiments, and numerous 
new spots (induced by specific drugs, for instance) have 
been added as a result. A modified version including addi- 
tional spots present in the Sprague-Dawley outbred rat his 
also been developed (data not shown). Figure 1 shows a 
greyscale representation and Fig. 2 a schematic plot of the 
master pattern. More than 1200 spots are included, most of 
which are visible on typical gels loaded with 10 uLofsolubi- 
lized liver protein prepared by the standard method and 
stained with colloidal Coomassie Blue. Master spot num- 
bers (MSN's) have been assigned to all proteins, and ap- 
pear in the following figures, each showing one quadrant of 
the pattern. Figure 3 shows the upper left (acidic, high 
molecular mass) quadrant. Fig. 4 the upper right (basic, 
high molecular mass) quadrant, Fig. 5 the lower left (acidic, 
low molecular mass) quadrant, and Fig. 6 the lower right 
(basic, low molecular mass) quadrant. The quadrants over- 
lap as an aid to moving between them. The gel position (in 
100 micron units), isoelectric point (relative to the CPK in- 
ternal pi standards) and SDS molecular mass (from the cali- 
bration curve in Fig. 8) are listed for each spot (Table 1). Be- 
cause of the precision of the CPK-p/ values, these parame- 
ters can be used to relate spot locations between gel sys* 
terns more reliably than using pi measurements expressed 
as pH. A major objective of cunent studies is the identifier 
tion of all major spots corresponding to known liver pf** 
teins, as well as rigorous definitions of subcellular orga- 
nelle contents. Of particular interest to us is the parallel de- 
velopment of identifications in the rat and mouse Ii v * 
maps, allowing detailed comparisons of gene expression ef- 
fects in the two systems. The results of these studies will be ^ 
presented systematically in a later edition of this database. 
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Re include here a useful series of 22 orienting idemifi- 
fcss as an aid 10 otherusers of the rai liverpattere (Table 



f Caxbamylated cbarte standards, compnted pfs and 
molecular mass standardization 

r e nave previously shown tbai the use or a system of dose- 
^paeed internal' p7 markers (made by carbamylating a 
Jc protein) offers an accurate and workable solution to 
ie problem of assigning positions in the p/ dimension [32]. 
be same system, based on 36 protein species made by car- 
amylating rabbit muscle CPK. has been used here to as- 
,gn pfs to most rat liver acidic and neutral proteins. The 
rjndards were coelectrophoresed with total liver proteins, 
fl d the standard spots added to a special version of the 
master pattern F344MST3. The gel ^coordinates of all 
ver protein spots lying within the CPK charge train were 
hen transformed into CPK p/ positions by interpolation 
rttween the positions of immediately adjacent standards 
Table 1) using a Kepler 1 vector procedure. 

thas proven possible to compute fairly accurate p/ values 
of many proteins from the amino acid composition [42]. 
Ve have attempted here to test a further elaboration of this 
ipproach. in which we computed pfs for the CPK standards 
iemselves, based on our knowledge of the rabbit muscle 
3PK sequence and the fact that adjacent members of the 
iarge train typicallv differ by blockage of one additional ly- 
ane residue (Table 3). We compared these values to similar 
computed pfs for an additional set of carbamylated stand- 
ards made from human hemoglobin beta chains and a se- 
nes of rat liver and human plasma proteins of known posi- 
tion and sequence (Fig. 7.Table 4).The result demonstrates 
good concordance between these systems. Two proteins 
show significant deviations: liver fatty-acid binding protein 
(FABP; #1 in Table 4) and protein disuiphide isomerase 
(GO in the table). The FABP spot present on F344MST3 
may represent a charge-modified version of a more basic 
JSrent spot closer to the expected p/, not resolved in the 
SEF/SDS gel. Of particular importance is the fact that, by 
comparing computed pf s of sequenced but unlocated pro- 
teins with the CPK pfs, we can assign a probable gel loca- 
j)ToD without making any assumptions regarding the actual 
gel pH gradient. This offers a useful shortcut, given the va- 
garies of pH measurement on small diameter IEF gels. We 
nave used this approach to compute the CPK pf s of all rat 
fid mouse proteins in the PIR sequence databases an aid 
iSprotein identification (data not shown). 



Sforder to standardize SDS molecular weight (SDS-MW), 
Jthave used a standard curve fitted to a series of identified 
jrpteins (Fig. 8). Rather than using molecular mass perse, 
have elected to use the number of amino acids in the 
Polypeptide chain, as perhaps a better indication of the 
j£ngth of the SDS-coated rod that is sieved by the second 
aension slab. The resulting values were multiplied by 
^ (the weighted average mass of amino acids in se- 
Benced proteins) to give predicted molecular masses. Be- 
'use we use gradient slabs, we have not constrained the fit- 
^curve to conform to any predetermined model; rather 
^tried many equations and selected the best using the 
Jgram M Tablecurve"on a PC. The equation chosen was,y 
a+ bx+ c/x 3 , whereyis the numberof residues,xis the gel 



Y coordinate, a is 5 1 1 .83, * is -0 .273 1 and e is 33 1 83 80 1 . The 
resulting fit appears to be fairly good over a broad range of 
molecular mass. 

33 An example of rat liver gene regulation: Cholesterol 
metabolism 

Experiment LSBC04 was designed as a small-scale test of 
the regulation of cholesterol metabolism in vivo by three 
agents included in the diet: lovastatin (Mevacor*,an inhibi- 
tor of HMG-CoA reductase); cholestyramine (a bile acid 
sequestrant that has the effect of removing cholesterol 
from the gut-liver recirculation); and cholesterol itself. The 
first two agents should lower available cholesterol and the 
third should raise it, allowing manipulation of relevant 
gene expression control systems in both directions. Such 
an experiment offers an interesting test of the 2-D mapping 
system since most of the pathway enzymes are present in 
low abundance, many are membrane-bound and difficult 
to solubilize,and the pathway itself is complex. Approxima- 
tely 1000 proteins were separated and detected in liver ho- 
mogenates. Twenty-one proteins were found to be affected 
by at least one treatment, and these could be divided into 
several coregulated groups. 

3 .3.1 MSN 413 (putative cytosolic HMG-CoA synthase) 
and sets of spots regulated coordinate^ or inversely 

One group of spots (including a spot assigned to the cyto- 
solic HMG-CoA synthase, MSN 413) showed the expected 
increase in abundance with lovastatin or cholestyramine, 
the synergistic further increase with lovastatin and choles- 
tyramine, and a dramatic decrease with the high cholesterol 
diet. Spot number 413 is the most strongly regulated pro- 
tein in the present experiment, showing a 5- to 10-fold in- 
duction aftera 1 week treatment with 0.075% lovastatin and 
1% cholestyramine in the diet (Figs. 9 and 10). Its expres- 
sion follows precisely the expectation for an enzyme whose 
abundance is controlled by the cholesterol level; n is pro- 
gressively increased from the control levels by cholestyra- 
mine, lovastatin and lovastatin plus cholestyramine, and it 
sinks below the threshold of detection m animals fed the 
high cholesterol diet. This spot has been tentatively identi- 
fied as the cytosolic HMG-CoA synthase, based on a reac- 
tion with an antiserum to that protein provided by Dr. Mi- 
chael Greenspan at Merck Sharp L Dohme Labo- 
ratories. This enzyme lies immediately before HMG-CoA 
reductase in the liver cholesterol biosynthesis pathway, and 
is known to be co-regulated with it. Spot 413 has an SDS 
molecular weight of about 54 000 and a CPK pi of - 1 1 .4, in 
reasonably close agreement with a molecular weight or 
57300 and a CPK p/ of -15.7 computed from the known se- 
quence of the hamster enzyme 143). 



Using a classical product-moment correlation test (Kepler 
procedure CORREL), a series of five additional spots was 
found to be coregulated with 413. The level o/ correlation 
was exceedingly high (> 95%). Two of these, 1250 and 933. 
are at similar molecular weights and approximately one 
charge more acidic than 413 (Fig. 9), indicating that they 
may be covalently modified forms of the 413 polypeptide. 
This suspicion is strengthened by the observation that both 
spots are also stained by the antibody to cytosolic HMG- 
CoA synthase.The remaining three correlated spots appear 
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to comprise as additional related pair (1253 and loon „r 
around 40 kDa and a single spot (1119) ofaro"„ d 28 U) a 
Because these two presumed proteins are present at su£ 

solic HMG-CoA synthase is reponed to consist of onJvone 
type of polypeptide, they are likely , 0 represent otSe rv er 5 
lightly coregulated enzymes. A second group of six ZIZ 
was selected based on a regulator pattern close £ £ in- 
verse of that for spot 413 (MSN's 34. 79. 178. 182 204 347 
data not shown). For these proteins, the lowest level of ex- 
pression occurs with exposure to lovasuiin plus cholestyra- 
mine and the highest level upon exposure to the hiehJhr*. 
lesterol diet. Spots 182 and 79 are highly cone.a!e d f„d*t 
about one charge apart at the same molecular weight- thev 

« n ^ S K b M iSOf0nDS ° f 3 $ingJe proleifl - ™ e other' four 
spots probably represent additional enzymes orsubunits. 

222 MSN 235 and coregulated spots 

A third group of five spots, mainly comprised of mitochon- 
drial proteins including putative mitochondrial HMG- 
CoA synthase spots, showed a modest induction by lovasta- 
tin alone, but little or no effect with any of the other treat- 
ments (including the combination of lovastatin and choles- 
tyramine; Fig. 12). This result is intriguing because lovasta. 
tin was expected to affect only the regulation of enzvmes of 
cno esterol synthesis, which is entirely exira-mitochon- 
cnal Three of the spots (235. 134. J44) form a closely- 
packed triad at approximately 30 kDa. and are likely to re- 
present isoforms of one protein. All three spots are stained 
by an antibody to the mitochondrial form of HMG-CoA 
synthase obtained from Dr. Greenspan. Subcellular fractio- 
nation indicates a mitochondrial location. The other two 
spots (633 at about 38 kDa and 724 at about 69 kDa) are 
each present at lower abundance than the members of the 



proteins of the putative mitochondrial path*,, 
much more variable in their expression in all Ir 3 V e ,0 
animation of all the coregulated groups sugiE An e »- 
titative statistical techniques can extract a weaT.h *r Quan " 
esung information from large sets of reproducible ,n i tu 
abundance of spots in the 413 coregulation grouo f», Th: 
pie. shows an amazing level of concordance in thei,™.?** 1 "" 
expression among the five individuals of the iova.,7, ■ * 
cholestyramine treatment group. This effect is nm ll" >K 
differences in total protein loading. since thevhav?^ 0 
been removed by scaling, and since proteins w ith 
S?? "* u ' auo ° P""™ «n be demonstrated < * 
13).Such efTects raise the possibility that manvgen t' r „* 
lauon sets may be revealed through the studv of , 
ciently large population of control animals (// with! 
any experimental manipulation). This approach" exniof, 
natura .biological variation in protein expres *„ 25S3 
drug effects, offers an important incentive for the con,,? Uf 
tion of a large library of control animal patterns 



4 Conclusions 

Because of the widespread use of rat liver in both basie bio- 
chemistry- and in toxicology, there is a long-term need hi t 
comprehensive database of liver proteins. The rat liver m« 
ter pattern presented here has proven to be an accurate re- 

fh!?™' 0 ! 1 ° f l i" S $ySlem ' navin * Deen malc »*° to more 
than 700 gels to date. As the number of proteins identified 
and the number of compounds tested for gene expression 
effects grows, we expect this database to contribute valu- 
able insights into gene regulation. Its practical utility in sev- 
eral areas of mechanistic toxicology is already beine de- 
monstrated. 

Received September II. 1991 



323 An example or an aati-synergistic effect 

A sixth spot (367) shows strong induction by lovastatin 
two- to threefold), and about half as much induction with 
fcvastatin plus cholestyramine, but without sharing toe £. 
mal-animal heterogeneity pattern of the 235-set (Fig 13) 
This protein is also mitochondrial, and represents the clear- 
est example of an anti-synergistic effect of lovastatin and 

111 « S ,nC - ™ e eXiSlen " of such an efrect d «,on- 
stra es that lovastatin and cholestyramine do not act exclu- 
sively through the same regulatory pathway. 

3 3.4 Complexity of the cholesterol synthesis pathway 

««« • t0g I eIher - the " «sults suggest that treatment with lo- 
va tatin alone can affect both cytosolic and mitochondrial 

S2 H f" C ° A ' WhilC *«l«mmine. on the 

0 her hand, either atone or m combination with lovastatin 

but liule S o a r S n» n e 8 fr ef l CCt ^ P " U,ive ^ 0solic 
wav An f tC: r° a the puta,ive mitochondrial path- 
way. An explanation for this difference may lie in lovasta- 

om S B f "h °. n K ,CVelS of HMG-CoA and related preeu«?r 
compounds that are exchanged between the cyiosol and 
the mitochondrion, whereas cholestyramine bTuTifEcl 

^ h C f y,OS ° l I IC »} h r y5 direct * strolled by choleste? 

01 and bile acd levels. It remains to be explained why some 
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CPK position 



M Figure 7. (a) Plot of computed isoelectric point versus f el Jt-posiuon for 
two sets of carbamyUted standard proteins (rabbit muscle CPK Maac 
human hemof lobin 0 chain, filled diamonds) and several other proteuu 
(shaded squares). (b)The idemiues of the various proteins represented 
by the squares are indicated by the numbers in corresponding positions 
oo (a); these refer to Table 4. 




Figure 9. Montage showing effects in tbf 
region of MSN :4 13. The montage sho»<»« 
small window into one portion of the 3-D 
pattern, one row of windows for each ex**" 
nmental group, and one panel for each gel 
in the experiment. The left-mosi pane** 
in each row is a group-specific copyofthf 
master pattern followed by the patt"* 1 - 
for the five individual rats in the gw* 
The highlighted protein spots (filled off 
les) are spot 413 (on the right of each P*** 
el; identified as cytosoiic HMG-CoAr» 
lhase) and two modified forms of i» 
and 933). From the top, the rows (««*"■ 
mental groups) are: high cholesterol. c©»" 
trols, cholestyramine, lovastatin, $*A 
sunn plus cholestyramine. 
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Fifurt 10. Barg raph showing the quantita- 
tive effects of vinous treatments on the 
abundance ofMSN:4I3 (cyiosolic HMG- 
CoA synthase) in the gels of Fig. 9. 




Figure //. Bargraphs of a series of six core- 
gulated spots including MSN:413. In the 
bargraphs. the abundances of the appro- 
priate spot (master spot number shown at 
the top of the panel) in each animal are 
shown. The five five-animal groups are in 
the order (left to right): high cholesterol, 
controls, cholestyramine, lovisutin, and 
lovasuiin plus cholestyramine. Each bar 
within a group represents one experimen- 
tal animal liver (one 3-D gel). Note the cor* 
related expression of the 6 spots, espe* 
cully in the two far right (most strongly in- 



duced) groups. 
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43.700 


173 


919 


1314 


-13.7 



SOSMW 



MSN 



Y CPKef SOSMW 



r w ... m' (I u«Moix,>ngwin( ipoi m 

predicted molecular mass (from the sunda/d curve of F.j. 8). 



53.600 
40.700 
51.600 
51.700 
25.000 
53.700 
47.600 
61.300 
37.300 
23.800 
26.100 
56.100 
42.500 
38.300 
49.700 
55.500 
43300 
44.50C 
160.80C 
34.100 
48.70C 
36.500 
50.80C 
37.40C 
65.20C 
42.90C 
15.30C 
13.90C 
36.00C 
33.50C 
42.6a 
86.10C 
37 JOt 
57.00C 
40.70C 
53.80C 
29.700 
36.00C 
16.80C 
28.1 0C 
37.70C 
43.70C 
43.200 
40.70C 
15.80C 
33.800 
77.900 
29.800 
51.600 
55.300 
26.500 
50.800 
13.700 
40.500 
117.000 
33.900 
62.100 
56.600 
91.400 
44.400 
162.400 
65.900 
37.600 
54.600 
40.000 
13.700 
38.400 
51.700 
164.900 
50.400 
44.700 
53.500 
71.800 
32.100 
19.300 



174 1364 

175 825 

177 1S62 

178 1321 



1089 
1866 
411 



191 

192 1818 

193 1469 

194 1380 

195 784 



200 
201 
202 
203 



221 
223 



251 
252 
253 
254 



872 
292 
736 
786 



204 1224 

205 439 

206 1994 

207 1895 
206 

210 1700 

211 902 

213 1067 

214 1340 

215 1591 

216 1565 

217 1159 
216 931 

219 713 

220 1479 



226 1065 
229 1577 



230 1456 
232 1440 
234 1692 



235 
236 
237 

236 1611 

239 1489 

240 501 

241 1820 

242 1357 

243 711 

244 1855 

245 1189 

246 551 

247 1348 

248 460 

249 1733 

250 1974 



183 
393 
563 
710 
615 
567 
295 
730 



179 
180 
181 
162 

184 1860 896 

185 1997 1017 

186 279 1113 

187 773 

188 1538 
1560 



296 
807 
674 
687 
555 
266 
632 

196 1227 1185 
187 667 553 
196 2006 
199 1711 



681 
674 
424 
435 
253 
829 
589 
963 
571 
667 

240 1418 



806 
874 
753 
995 



255 1690 
256 



257 
258 1517 



499 

517 
684 
668 
495 

755 
393 
572 
177 
911 
927 
716 



965 
934 

225 1812 1045 

226 621 411 

227 1586 1483 
567 



890 
496 
849 
489 

616 1004 



920 1136 
952 1008 
541 
720 
448 
569 
658 
1182 
621 
474 
459 
604 



451 
788 
392 
553 
848 
450 
679 



994 1006 
506 464 



620 



-6.7 
•15.7 
-36 
-7.2 
-10 4 
-0.5 
•32.1 
•16.2 
-0.6 
>O0 

«-35.0 
•17.0 
-4.2 
•3.9 
-0.9 
-5.0 
-64 

•167 
-84 

•20.1 
>0.0 
-2-2 

-14.7 
<-35.0 

•16.0 

-16.7 
-6.5 

-30.9 
>0.0 
-0.3 
<-35.0 
-2.3 

-14.1 

-10.4 
-7.0 
•3.5 
•3.6 
-6.3 

-13.5 

•18.7 
-4.8 

-12.8 

-13.5 
-1.0 

-15.8 
•3.6 

•10.8 
-37 
-5.2 
-5.5 
-2.4 

-22.0 

-137 

-13.1 
-3.2 
■48 

-277 
-0.9 
-66 

-167 
•0.6 
-8.9 

-25.1 
-6.9 

•29.3 
-1.9 

>0.0 

-16.1 

-14.6 

-17.6 

-12.1 
24 

-12.1 

-27.4 
-44 



162.900 
69.300 
£2.600 
43.000 
48.300 
51.600 
91.200 
42.000 
34.500 
29.800 
26.300 
90.800 
38.400 
44.600 
44.200 
52.400 
101.600 
47.300 
23.700 
52.600 
44.500 
44.900 
65.000 
63.700 
107.800 
37.400 
50.000 
31.100 
51.300 
44.200 
15.800 
57.000 
55.400 
44.400 
45.200 
57.300 
40.700 
69.300 
51.200 
170,500 
33.900 
33.300- 
42.700 
28.800 
66,800 
13.600 
51.600 
34.800 
57.300 
36.500 
57.900 
30.300 
25.400 
30.200 
53.500 
42.500 
62.100 
51.400 
45.800 
23.800 
48.000 
59.300 
61.000 
49.100 
62.100 
61.800 
39.200 
69,500 
52.500 
36.500 
61.600 
44.600 
30.200 
60.400 
37.800 



924 



MSN 



360 
251 
262 
263 
265 
266 
267 
266 
260 
570 
271 
272 
274 
275 
276 
277 
278 
579 
281 
282 



L AAtmoo 



LttCtTy 



▼ CPIW SOSMW 



MSN 



Y CPK* SOSMW 



1796 
661 
1725 
496 
1063 
1390 
510 
660 
430 
1044 
2019 
657 
695 
1292 
1350 
1670 



34 

£85 
256 
236 
289 
390 
291 
$92 
293 
294 
295 
296 
297 
299 
500 
301 
302 
303 
304 
305 
306 
307 
306 
309 
310 
311 
312 
313 
3U 
315 
316 
320 
321 
322 
323 
324 
325 
326 
327 
326 
330 
331 
332 
333 
334 
335 
336 
336 
339 
340 
341 
343 



961 
679 
1646 

1505 
1313 
1314 
1332 
1277 
1391 
1147 
925 
787 
1462 
S31 
660 
1162 
218 
1377 
913 
2012 
702 
494 
403 
1643 
1049 
1606 
1219 
1627 
1524 
1769 
1609 
266 
1902 
1316 
1341 
1104 
1480 
650 
1454 
670 
655 
1521 
1567 
1366 
446 
1608 
1566 
531 
764 
1056 
1593 
1616 
1854 
1265 
561 
1497 
1351 
1613 



961 
1361 
679 
1127 
172 
673 
437 
1038 
961 
606 
653 
422 
968 
712 
560 
1089 
536 
716 
570 
1064 
525 
1147 
629 
408 
652 
624 
579 
511 
1476 
618 
445 
696 
609 
814 
979 
1523 
667 
178 
1280 
1008 
1565 
503 
969 
916 
755 
692 
1028 
1451 
1406 
1365 
1395 
523 
1053 
145G 
603 
1494 
626 
101 
675 
677 
406 
1291 
751 
667 
471 
1156 
407 
303 
596 
10CM 



565 
1047 
266 
549 



•1.1 
•204 
•20 
-28.0 
-10.9 
•6.3 
-27J 
-204 
-31.0 
-11.2 
>0.0 
-15.0 
-14.2 
-7.6 
-6.9 
-2.6 
-19.4 
-13.0 
•14.5 
-0.7 
•4.6 
-7.3 
-7.3 
-7.1 
-7.8 
-6.3 
-0.5 
-13.6 
-16.6 
•5,1 
-26.3 
•14.9 
-9.3 
<-35.0 
-6.5 
-13.9 
>0.0 
-19.0 
•28.1 
-32.6 
-0.7 
-11.1 
•3.3 
-8.5 
•3.0 
-4.4 
-1.5 
•3.3 
<-35.0 
-0.3 
-7.3 
-7.0 
-10.1 
-4.9 
-15.1 
-5.3 
-20.0 
-20.6 
-44 
-3.6 
-6.3 
-30.0 
-3.3 
-3.8 
•26.3 
•16.7 
-10.9 
-3.5 
-3.2 
-0.6 

-e:o 

-23.6 . 
-4,7 
-6.6 
-0.0 



31.900 
17.700 
44.600 
25.800 
177.400 
45.000 
63400 
29.000 
31.900 
46.900 
36.300 
65.200 
31.700 
42.900 
49.900 
27.100 
53.700 
42.600 
51.300 
27.300 
54.600 
25.100 
37.400 
67.200 
46.100 
37.600 
50.700 
55.900 
13.900 
37.600 
62.000 
43.600 
46.700 
36.000 
31.300 
12.400 
45.300 
169.200 
20.400 
30.100 
10.300 
49.600 
30.900 
33.700 
40.700 
34.700 
29.400 
14.700 
16.100 
17.600 
16.600 
64.900 
28.500 
14.400 
49.100 
13.300 
47.700 
420.500 
44.600 
44.700 
67.000 
20.100 
40.900 
43.700 
59.600 
24,700 
67.300 
68.500 
49.400 
30.300 
34.900 
50.300 
28,700 
102^00 
52.600 



345 1006 

346 1095 

347 625 



349 
350 
351 



361 
110 
621 
912 



352 1574 

353 961 

354 706 

355 1450 

356 1374 

357 474 
356 796 

359 764 

360 1364 

361 1713 

362 1161 

363 914 



365 
366 



412 
741 
678 



367 1560 

366 963 

369 434 

370 639 

371 1567 

372 1875 

373 1351 

374 1506 

375 1623 

376 254 

377 1409 

378 621 

379 1017 

381 953 

382 856 

363 12S2 

364 1699 

365 1042 

386 1490 

387 1554 
386 1193 

389 1374 

390 1456 

391 718 

392 1799 

393 1482 

394 1227 

395 1530 

396 1410 

397 912 

399 1465 

400 1473 

401 1029 

403 1516 

404 1495 

405 1525 

406 723 

409 650 

410 1501 



411 
412 



936 

350 



413 1033 

415 737 

416 1578 

417 646 

418 1695 

419 725 

420 1289 



421 
422 
423 
424 



1171 
509 
929 
739 



578 
640 
728 
963 
1343 
1130 
619 
530 
912 
762 
830 
1152 
997 
346 
336 
1066 
769 
659 
1156 
435 
466 
1503 
635 
520 
441 
610 
860 
762 
1050 
715 
532 
417 
563 
494 
595 
598 
674 
256 
1516 
493 
563 
603 
404 
602 
969 
690 
732 
756 
1461 
577 
755 
256 
1063 
450 
1140 
754 
554 
1092 
252 
663 
478 
1057 
1120 
538 
425 
606 
496 
462 
770 
1041 
912 
162 
856 



425 1490 965 



-11.9 
•10.3 
•21.7 
-35.3 
«-35.0 
•26.7 
•13.9 
-3.7 
-12.9 
•18.9 
-5.3 
-6.5 
•28.7 
•16.3 
-17 J 
-64 
-2.1 
-9J 
-13.6 
•32.0 
-17 M 
•14.6 
-3.9 
•12.4 
-31.0 
-21.2 
•3.6 
-0.5 
-65 
-46 
-0.6 
<-35.0 
-6.1 
-21.5 
-11.7 
•13.1 
-15.0 
-6.1 
•2J 
-11.2 
-4.7 
-4.0 
•6.9 
-6.5 
-5.2 
•18.5 
-1.1 
-4.8 
-84 
-4.3 
-6.0 
•13.9 
-5.0 
-4.9 
-11.5 
-44 
-4.7 
-4.3 
•18.4 
-20.8 
-4.6 
-13.4 
•35.9 
-11.4 
-16.0 
-3.7 
-21.0 
•2.3 
-16.3 
-7.7 
•9.1 
•22.6 
-13.6 
-17.9 
-4.7 




50,600 


426 


*49q 


704 


-7-t 
•16.0 
39 
-6.0 


46.600 


**/ 


610 


843 


49 OfiO 


4*8 


1565 


303 


31 100 


A 


1259 


647 






1253 


562 


-8.1 


2£ 7DO 




734 


1426 


-18.1 


46.100 


AA9 
*** 


483 


433 


•285 


Ka w 


434 


310 


1041 


•26.9 


33.900 


*JJ 


iMn 

1(KU 


1170 


-11.6 


40,400 


•■JO 


IIS 


196 


-9.6 


37 w 


AIT 
•3/ 


1670 


673 


•0.5 


9 A Oflft 


*Jo 


435 


1102 


•31.0 


30 Jton 




66 


647 


<-35.0 


MA 

f fJBQO 


440 


1740 


544 


•1.6 


/V.4GU 


441 


599 


1571 


•22.8 


Z7.B00 


443 


743 


335 


-17.8 


40,100 


446 


801 


666 


-16.2 


36.100 


447 


1050 


926 


•11.1 


24.600 


448 


1245 


1296 


-8.2 


63.700 


449 


1576 


1516 


-37 


56.200 


450 


1818 


1021 


-0.9 


13.000 


451 


1094 


440 


-10 2 


33,000 


452 


1945 


602 


>0.0 


96*200 


453 


1652 


8M 


•2.6 






1403 


500 


-6.1 


AM 7|M 


AKAS 


1394 


718 


-6.3 




it? 

457 


905 


436 


-14.0 


Af\ inn 


45P 


1038 


581 


-11J 




4ow 


1596 


294 


•3.4 


49 7no 


Aft* 


4 CM) 


863 


-4.3 




Ad 


1096 


1137 


•10.2 






649 


1125 


•15.2 




4o4 


1814 


1072 


-0.9 


c? coo 


jet 

4od 


1366 


481 


-6J 


4W.OUU 


466 


1194 


1064 


-8J 


ao atx\ 


466 


577 


467 


-23.9 


m\a ono 




1 140 


866 


-9.6 




Ayr* 
47W 


1797 


524 


-1 .1 


12.SOO 


*/ 1 




1133 


•7.6 






618 


655 


-21 .9 


50400 


*/ J 


ZDQv 


299 


>0.0 


ao ion 


474 


1205 


215 


-8.7 




• fa 


1035 


786 


•1 V4 


34.300 


*f O 


• OV 


155 


<-35.0 


31 700 


*/ / 


*OV 


i J7U 


-28.9 


44.000 


ATM 


COO 


662 


-22.6 


41.900 


47Q 


IftOO 
lUUV 


9mO 


-l l .0 


40.600 


480 


191*. 


23a 




14.400 


4A9 


ft 1ft. 






50,800 


483 


CM 
DIM 


O / J 


•1V.J 


40.800 




4 CM 

iDue 


1013 


-3.3 


106.400 


4A£ 


A7% 


cm 
3v9 


•zs.o 


28 100 




IwZs 


607 


-11 .5 


61 900 




1045 


1166 


-11.2 


95 MO 




1609 


301 


-3.3 


AH MY1 


480 


775 


1289 


•17.0 




491 


692 


178 


•19.3 


9? inn 


492 


1100 


964 


•10.2 


IDA AAA 


493 


1760 


776 


-1 .6 


At con 


494 


682 


247 


« A C 

-14.5 




495 


470 


1256 


•28.9 


9fl Wl 


*¥D 


494 


1436 


•28.1 


96 OOO 


AGJ 


t^Mf\ 

NOV 


852 


•1Z.9 


53.700 


499 


1414 


CA£ 


-6.0 


64^000 


500 


1234 


1072 


-63 


48.900 


501 


1246 


659 


-8.2 


57.300 


502 


624 


792 


-15.7 


56.600 


503 


1246 


1134 


-8.2 


40.000 


504 


1115 


1407 


-9.9 


28.900 


505 


1169 


391 


-8.9 


33.900 


506 


1578 


402 


-3.7 


163.700 


507 


787 


250 


•16.6 


36.200 


508 


979 


552 


-12.5 


47,700 


509 


1153 


619 


•64 


31.600 


510 


1730 


1006 


-20 



*«*t 

I47.«t 
45.0CC 
26.7QC 
36 6X 
S3A 
10.8K 

ao.icc 

45JK 

33.30C 

19 act 

12.6X 
29 OCT 
63.1ft 
3I.6X 
34.6D0 
S6.90C 
42.600 
63.90C 
S0.S0D 
91 .4CC 
35 WC 
25 4X 
25.8K 
27.80B 
56.700 
27J0D 
60.100 
34.900 
54J0D 
25.900 
46.000 
69.900 
131.300 
39.2K 
207.6X 
17.400 
45.600 
53500 
117.400 
77.800 
44.900 
30.000 
49J0D 
48.0 
23.700 
89.200 
20.100 
169.300 
31JBD 
39.700 
110.700 
21*300 
1&200 
36.400 
S11« 
27.1* 
45.70D 
39.00 
2LSt 

6BLX0 



ioa«* 

4&K* 



. .*v 



r 



9u mi. /j.w-«o 



D«ubts* of r»i l»»er proteins 



925 





x 


Y 




SOSMW 


104 


Y 


V 

T 






wan 


w 
A 


Y 


CPKof 


SOSMW 


511 


600 


484 


•16.0 


56.400 


596 


619 


269 


-21.9 


100.500 


674 


1661 


448 


-2.7 


62.100 


512 


1099 


533 


-10.2 


54.100 


597 


1176 


461 


-9.1 


60.700 


675 


1523 


562 


-44 


51 .900 


519 


16B6 


1034 


-2.3 


29.200 


598 


1465 


1044 


•50 


28.800 


676 


708 


642 


-16.8 


46.700 


514 


5*6 


636 


-13.2 


47.100 


599 


741 


1188 


•17.9 


23.600 


677 


919 


615 


-13.7 


48.300 


515 


481 


543 


-28.5 


53.400 


600 


607 


402 


-14.0 


68.000 


678 


1085 


551 


-10.5 


52.700 


51$ 


1334 


1044 


-7.1 


28.800 


601 


687 


656 


•19.5 


45,800 


679 


600 


623 


•22.7 


33 400 


517 


866 


1021 


•14.8 


29.700 


eoe 


712 


1138 


-16.7 


25.400 


680 


1237 


10O4 


4.3 




510 


796 


779 


-16.3 


39 600 


60S 




161 


•14.1 


165.200 


661 


1103 


263 


•10.1 


95 100 


519 


822 


670 


-15.7 


45.100 


604 


783 


1461 


•16.7 


14.400 


662 


1406 


477 


4.1 


59 100 


50 


632 


165 


-21.5 


189.000 


606 


736 


223 


•16.0 


125.300 


683 


1596 


246 


•3 4 


100 won 


521 


1332 


830 


-7.1 


37J00 


606 


629 


273 


-21.6 


98.700 


684 


555 




•24.8 




522 


603 


11 CM 


-22.6 


26.600 


607 


1064 


266 


-10.8 


94£00 


665 


1167 


1313 


"9.2 


19.300 


523 


1190 


309 


-8.9 


86.800 


608 


883 


503 


-14.5 


56.700 


686 


1932 


790 


0.0 


39.100 


524 


479 


1226 


-28.6 


22.300 


609 


2012 


610 


>0.0 


46.700 


687 


1545 


619 


-4.1 


48.100 


525 


766 


1066 


-17.2 


28.000 


610 


12S6 


903 


•6.1 


34.200 


688 


1456 


764 


•5.2 


40.300 


526 


747 


1016 


-17.7 


29.800 


612 


1103 


391 


-10.1 


69,600 


689 


1011 


953 


•11.8 


32.300 


527 


1170 


231 


-9.2 


119.800 


613 


778 


265 


•16.9 


102.000 


660 


1995 


270 


>0.0 


100.200 


52B 


1502 


542 


-4.6 


53.400 


614 


'824 


516 


-15.7 


55.400 


691 


812 




-16.0 


34 900 


530 


1728 


620 


•2.0 


46.000 


615 


1095 


195 


-10.3 


149.100 


692 


1154 


1461 


-9 4 




ss 


507 


1011 


•27.4 


30.000 


616 


1759 


478 


-1.6 


59.000 


693 


1993 


819 


>0.0 


37 tffi 


533 


670 


489 


•14.7 


57.900 


617 


994 


372 


-12.1 


72.900 


694 


1628 


656 


•3.0 
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Table 4. Computed pfs of some known proteins related to measured CPK pf% 
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32 


57 


15 


53 


27 


3.96 


5 


Serum albumin rsc 


ABRT5 


32 


57 


15 


53 


24 


O./l 


6 

w 


Cufkernriri dismuiase /Cu-2n SOD) rat 




8 


n 


10 


9 


4 


0.91 


7 


Phosphotipase C. phophoinosrooe-speofic (?), rat 


A28807 


34 


42 


9 


49 


21 




6 


Albumin human 


ABHUS 


36 


61 


16 


60 


24 


7n 


9 


Apo A-l lipoprotein, rat 


A24700 


16 


24 


6 


23 


12 




10 


nrnAoo A-l lioooroiein human 


LPHUA1 


16 


30 


6 


21 


17 




11 


NADPH evtoehrome P-450 reductase, rat 


RDRTCM 


41 


60 


21 


38 


35 




12 


Retinal bindina oroTein human 


VAHU 


18 


10 


2 


10 


14 




13 


Art in beta rat 


ATRTC 


23 


26 


9 


19 


18 


C AC 
9.UD 


14 


Act m gamma, rat 


ATRTC 


20 


29 


9 


19 


Ifi 


5 07 


15 


Apo A-l lipoprotein, human 


LPHUA1 


16 


30 


5 


21 


16 


5.10 


16 


Apo A-IV lipoprotein, human 


LPHUA4 


20 


49 


8 


28 


24 


4.86 


17 


Tubulin alpha, ra 


UBRTA 


27 


37 


13 


19 


21 


4.66 


18 


FlATPase beta, bovine 


PWBOB 


25 


36 


9 


22 


22 


4.80 


19 


Tubulin beta, pig 


UBPGB 


26 


36 


10 


15 


22 


4.49 


20 


Protein disulphioe isomerase (POI). rat hepatic 


ISRTSS 


43 


51 


11 


51 


9 


4.07 


21 


Cytochrome b5. rat 


CBRT5 


10 


15 


6 


10 


4 


4.59 


22 


Aoo 0*11 liooorotein. human 


LPHUC2 


4 


7 


0 


6 


1 


4.44 




Amine acid pi assumec in calulation: 




3.9 
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12.5 
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An updated two-dimensional gel database of rat liver 
proteins useful in gene regulation and drug effect 
studies 

We have improved upon the reference two-dimensional (2-D) electrophoretic 
map of rat liver proteins originally published in 1991 (N. L. Anderson et at.. 
Electrophoresis 1991, 12. 907-930). A total of 53 proteins (102 spots) are now 
identified, many by microsequencing. In most cases, spots cut from wet. Coo- 
massie Blue stained *2-D gels were submitted to internal tryptic digestion (2], 
and individual peptides, separated by high-performance liquid chromatography 
(HPLC). were sequenced using a Perkin-Elmer 477A sequenator. Additional 
spots were identified using specific antibodies. 



Figure 1 shows the current annotated 2-D map of F344 
rat liver, analyzed using the Iso-DALT system (20 X 25 
cm gels) and BDH 4-8 carrier ampholytes. Both the 
map itself and the master spot number system remain 
the same as shown in the original publication. Table 1 
lists the important features of each identification shown, 
including the gel position, p/. and A/ t for the most 
abundant or most basic form of each protein. Using this 
extended base of identified spots, a series of four 
improved calibration functions has been derived for the 
p/ and SDS-A/, axes (the first two of which are shown in 
Fig. 2A and B). Both forward and reverse functions are 
derived, so that one can compute the physical properties 
of a spot with a given gel location, or inversely compute 
the gel position expected for a protein having given 
physical properties: 

^RVTUVER = /m— ft*Tltv£F x '-^SEOIE^CE -DE RIV£d) (I) 
^RAT LIVER ~ /pl-RATUVER X (PAeOIENCE-DERHEd) (2) 
GEL-DERIVED ~ /raTLIVER V-M r ( ^IUTLIVEr) (3) 
•DERIVED 3 /raTUVER X-»! (-^RaTLTVEr) (4) 

A spreadsheet program (in Microsoft Excel) was devel- 
oped to facilitate flexible computation of pfs from 
amino acid sequence data, and the results were entered 
into a relational database (Microsoft Access). A table of 
spot positions and sequence-derived pi's and A//s was 
fitted with a large series of analytic equations using 
Tablecurve (Jandel Scientific), and the four conversion 
Eqs. (1M4), relating computed p/ and gel X coordinate, 
or computed molecular weight and gel Y coordinate, 
were selected, based on criteria of simplicity, goodness 
of fit and favorable asymptotic behavior. Table 2 lists the 
equations and coefficients. Application of Eqs. (3) and 
(4) to a spot's A' and Y coordinates, given in [1], produce 
improved M t estimates, and allow computation of pi 
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directly in pH units, instead of in terms of positions rela- 
tive to creatine phosphokinase (CPK) charge standards. 
The inverse Eqs. (1) and (2) were used to compute the 
gel positions of a series of pi and M t tick marks. These 
tick marks were plotted with SigmaPlot (Jandel), 
together with fiducial marks locating several prominent 
spots, and the resulting graphic was aligned over the syn- 
thetic gel image (computed by Kepler from the master 
gel pattern) using Freelance (Lotus Development). Maps 
were printed as Postscript output from Freelance, either 
in black and white (as shown here) or in color, where 
label color indicates subcellular location (available from 
the first author upon request). We have also used the rat 
liver 2-D pattern as presented here to calibrate the pat- 
terns of other samples. Using mixtures of rat liver and 
mouse liver samples, for example, we made composite 
2-D patterns that allow use of the rat pattern to standar- 
dize both axes of the mouse pattern. This was accompli- 
shed by deriving transformations relating the rat and 
mouse X, and separately the rat and mouse Y, axes 
(Table 2, lower half; Fig. 2C and D) based on a series of 
spots that coelectrophorese in these closely related spe- 
cies. These functions were then applied to derive equa- 
tions relating the mouse liver X and Y\o p/and SDS-Af r 
(Eqs. 5 and 6 below). The resulting standardized 2-D pat- 
tern for B6C3F1 mouse liver is shown in Fig. 3. 

^rMOUSE LIVER ~ f% AT LIVER Y— Mr (/mOUSE LIVER Y-RaT LIVER Y 

(^MOUSE LIVER)) (5) 

P^MOUSEUVER — AaTLIVER X-pl (/*MOUSE LIVER X— fcATUVER X 

O^MOUSE liVE*)) (6) 

A slightly more complex approach can be used to stand- 
ardize samples that have few or no spots co-electropho- 
resing with rat liver proteins. In this case, a 2-D gel is 
prepared with a mixture of the two samples, and four 
functions (forward and backward, each for X and Y) are 
derived relating each sample's own master pattern to the 
composite. The required functions are then applied in a 
nested fashion to yield the desired result (using rat 
plasma as an example): 

RAT PLASMA = SraTUVER Y-M f C/lLAT PLASMA* LIVER Y-RAT LIVER Y 

C/raT PLASMA V— RAT PLASMA* LIVER Y ( ^HATPLASMa))) 

a) 

0I734US/9J/I0IO-I977 $5.00* J5/0 
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Figure I. Master 2-D gel pattern of Fischer 344 rai liver proteins, annotated with 53 protein identifications and computed p/ and M r axes. 
Tentative identifications are in iulic type. 



Table 1. Proteins identified in the 2-D pattern of F344 rat liver 



MSN* } 


Proieio ID" 


Protein name 


Identification comments 


Gel JT' 


Experimental 
p/" 


Gel r ] 


Experimental 

Mr ... 


126 


HADO-HUMAN" 


3-HA-3.4-DO: 3-hydroxy. 
amhranilaie-3.4-diory- 
genase 


Internal sequence 


871.95 


5.36 


92U5 


30 207 


137. 159, 288, 


DIDH.RAT 


3HDD: 3-hydroxysteroid 


Ab (T.M. Penning) and pure protein 


1857.52 


6.51 


822.52 


34 406 


258 




dihydrodiol reducuse 












173 


MUP.RAT 


i;u gJobulin 


Presence in liver microsome lumen, 
abundance in kidney, p/, M, 


919.16 


5.43 


1313.81 


19 549 


38 


ACTB.HU MAN 


Actis 3 


Analogy with other mammalian patterns 
(e.g. human) through electrophoresis 


763.40 


5.19 


693.64 


41 586 


68 


ACTG.HU MAN 


Aciin y 


Analogy with other mammalian patterns 
{e.g. human) through ^electrophoresis 


779.42 


521 


692.26 


41 677 


693 


AFAR.RAT 


AJlatoxin Bl aldehyde 
reducuse 


Internal sequence 


1993 J2 


6.72 


818.60 


34 593 


28. 21. 33 


ALBU.RAT 


Albumin 


Coelecuophoresis with principal plasma 
protein 


1262.81 


5.86 


445.64 


66 354 


43 


dham.rat 


Aldehyde dehydrogenase 


A'-Terminil sequence and AAA 


1317.72 


5.91 


589.03 


49 602 


96 


ARGI.RAT 


Arginase 


Internal sequence 


1730.72 


6.34 


756.02 


37 819 


117 


SUAR.RAT 


A/ylsulfotransferase 


Internal sequence 


1547.96 


6.14 


849.08 


33 186 


1163. 1161, 


GR78.RAT 


BIP (GRP-78) 


Ab (F. Wiizmann) 


665J3 


5.01 


397J9 


74 564 


1162, 20 
















185 


CAH3.RAT 


CA-IIJ 


Uncertain; by comparison with mouse 


1996.60 


6.72 


,1017.02 


26 887 


123 


CALM.HU MAN 


Calmodulin 


Analogy with human cellular patterns 
through coelectrophorests 


23.05 


4.03 


1433.25 


17 419 


3. 201, 48, 39, CRTCJUT 


Calreticulin 


Ab (Lanee Pohl) 


310.59 


4J4 


433.80 


68 206 


22, 24 
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Table 1. continued 



MSN" 


Protein IDb) 


Protein name 


Identification comments 


Gel r> 


Experimental 


Gel r* 


Expcnmt 
Af," 


11S*. U86, 


CPSM.RAT 


Carbamy] phosphate 


2-D of Dure nrotein* eismfirmed hv 


1453 J 6 


6.05 


181.64 


160 640 


114, 174, 118 




synthase 


Af*tcnninal uflueim *nd AAA 










5. 167. 157 
















54.61 


CATA.RAT 




Internal uouenv 


2000.81 


675 


499.64 


58 968 


136 


COX2.RAT 


COX-I1 


AK ft \L* Tunnunl ennftnnpri hv 
\*' r i mm nun j. ufuiiiuicy vj 


452.57 


4.61 


1062.67 


25 504 








in tenia] teouenn 










87 


CYB5.RAT 


Cytochrome B5 


2-D of pure protein; Ah; confirmed 


515.68 


4.73 


1370.55 


18 493 








by AAA 










41 


CK-RAT*' 


Cyiokeratin 


Location in eyioskcletal fraction 


1165.12 


5.75 


569.09 


51 448 


29 


CK-RAT" 


Cytokeratin 


Location in cytoskeletaJ fraction 


743.11 


5.15 


605 .23 


48 187 


5, 11 


ENPL-RAT' 


Endoplasmic 


Ab (F. Wtianann) 


567.73 


4.83 


263 37 


112 194 


60 


ENOA.RAT 


Enolase A 


interna] sequence and AAA 


1399.78 


6.00 


62334 


46 674 


27 


ER60.RAT 


ER-60 


A'-TerminaJ sequence (R. M. Van Frank,) 


1184 JO 


5.77 


52331 


56.169 


17 


ATPB.RAT 


Fl XTPase B 


^Terminal sequence and AAA 


629.06 


4.95 


588.83 


49 620 


196 


ATP7.RAT 


Fl ATPase 6 


Internal sequence 


1227J4 


5.82 


1184.65 


22 310 


79 


F16P.RAT 


Fructose* 1.6-bis-pbnspbause 


Uncertain; by comparison with ID in 


924.54 


5.44 


737.77 


38 858 








Garrisoo and Wager (JBC 257:13135-13143) 










62,78 


DHE3.RAT 


Gluiamate dehydrogenase 


A-TerminaJ sequence and internal sequence 1887 .39 


635 


566.92 


51 655 


125 


HAST- RAT"' 


HAST-1: N-bydroxyaryl* 


Internal sequence 


1297.94 


5.89 


86135 


32 638 






amine sulfo transferase 












307 


HOl.RAT 


Heme oxygenase 1 


Uncertain; available data from internal 


1219.J9 


5.81 


915.71 


30 423 








sequence 










413, 1250, 


HMCS.RAT 


HMG CoA synthase. 


Ab (J. Germenhausen) 


1033.48 


5.59 


538.13 


54 571 


933 




cytosolic 












133, 144, 235 


HMCS.RAT 


HMG CoA synthase. 


Ab (J. Germenhausen), AMerminal 


666.40 


5.02 


1019.42 


26 811 






mitochondrial (frag) 


sequence (Steiner/Lotupeich) 










8. 23, 1307 


HS7C.RXT 


HSC-70 


Positional homology (with human, etc.) 


811.87 


5.27 


425.76 


69 521 








through ^electrophoresis 










15. 25, 110 


P60.RAT 


HSP-60 


Ab (F. Witzman); confirmed by ^-terminal 


845.09 


5.32 


520.03 


56 561 








sequence and AAA 










971 


HS70-RAT ft 


HSP-70 


Ab (F. Witzman) 


976.11 


5.51 


437.14 


67 674 


1216, 1215, 90 


HS90-RAT' 


HSP-90 


Ah (F. Witzman) 


659.86 


5.00 


329 


90 107 


256 


INGI-HUMAN 


Interferon-y induced 


Internal sequence 


993,85 


5 34 


1006.04 


27 237 






protein 












415. 734 


LAMB-RAT" 


ljmin B 


Positional homology with human through 


737.10 


5.14 


425,19 


69 615 








coelectropboresis, nuclear location 










80 


LAMR-RAT 0 


*Liminifi receptor* 


Internal sequence 


534.02 


4.77 


697.62 


41 327 


227 


FABL.RAT 


L-FaBP (liver fany acid 


Ab (N. M. Bass) 


1586.09 


6.18 


1483.43 


16 622 






binding protein) 










134 


MDHC.MOUS 
E 


MaJate debydrogenase 


Internal sequence 


1270.85 


5.86 


861.96 


32 620 


18. 35. 226 


GR75-RAT' 


MiteonJ; grp75 


Positional homology with human through 


905.67 


5.41 


413.67 


71 589 








coelectropborests 










175. 251 


NCPR.RAT 


NADPH P450 reduaase 


2-D of pure protein 


824.69 


5.29 


393 J 1 


75 366 


1168, 1170, 


PDLRAT 


PD1; Protein disulfide 


^Terminal sequence (R. M. van Frank), Ab 


564J0 


4.83 


528.47 


55 618 


1171 




isomerasc 












47. 93 


ALB U_ RAT 


Pro-Albumin 


Microsomal lumen location, p/, M t relative 


1391.03 


5.99 


446.68 


66 195 








to albumin 










236 


APA1.RAT 


Pro-APO A-I lipoprotein 


Coelectropborests with plasma protein 


920.41 


5.43 


113731 


23 467 


320 


IPK1.BOV1N 


Protein kinase C inhibitor 1 


Internal sequence; homology with bovine 


1480.01 


6.08 


1458.81 


17 007 








protein 










152 


PNPH.MOUSE 


Purine nucleoside 


Internal sequence 


1507.19 


6.10 


911.16 


30 599 






phospborylase 












1179. 1180. 


PYVC.RAT* 


Pyruvate carboxylase 


Tentative; 2-D of pure protein (J. G. 


1485.10 


6.08 


22332 


131 589 


1181. 1182, 






Henslee, JBQ 1979); reported in Biochim. 










1183 






Biopftys. Acta 1022. 115-125- 










55. 103 


SM30.RAT 


5MP-30: Senescence 


Internal sequence 


721.71 


5.11 


830.10 


34 051 






marker protein-30 












135 


SODC.RAT 


Superoxide dismutase 


AAA; comfirmed by internal sequence 


116U4 


5.74 


1388.68 


18 173 


172 






(R. M. Van Frank) 










TPM-RAT" 


Tm: tropomyosin 


Location in cytoskeleton, 2-D position 


476.24 


4.66 


957.86 


28 865 








relative to human, Ab 










277. 56 


TBA1.RAT 


Tubulin o 


Positional homology, with human through 


68U2 


5.06 


537.67 


S4 620 


50, 1225 


TBB1JLAT 




coelectropboresis, cytoskeletaJ locatioo 










Tubulin 0 


Positional homology with human through 


621.29 


4.93 


535.48 


54 855 


1224 


VIMEJUT 




coelectropboresis, cytoskeletaJ location 










Vimenttn 


Posiional homology with human through 


673.00 


5.03 


53930 


54 426 



coelectrophoresis, cytoskeletaJ location 



1980 



ft Mi. 



£lr*fpkoftiii 1995. /fl. 1977-1911 



Ttak 1. continued 



MSN" 


Prolan IDb) 


Proieifi name 


Identification comments 


Gel*" 


Experimental 


Gel r* 


Experimental 


113 




7: oot m sequence 


lotereil sequence 


I191J8 


5.78 


680.42 


42 469 






databases 












104 


BBPL.RAT 


23 kDa morphine-bindim 


Internal sequence 


773J1 


5.20 


1182.41 


22 363 






protein 













a) Muter spot number (MSN) from [1] 

b) SwiuPROT identifier 

c) Coordinates of the most basic or most abundant assigned spot on the F344 master gel pattern 

d) pi and M t of the most baaie or most abundant assigned spot, derived from the calibration functions included here 

e) SwissPROT style proposed identifier 
Abbreviations: AAA. amino acid analysis; Ab, antibody 



Table 2. Equations and coefficients 



Function 



Equation (0 



r2 



Rat gel Y » ficomputet A/.i v - e - texpM/d 0.988181021 178.74803 1 967.7892 32363.958 

Rat ge! X « ficomputed p/> > « a - bx - o/Ibjt - tfVx * e/x 1 - 5 0.99247216 -8685665.5 -904497.94 3856926.1 18276844 -27154534 

Computed M, - fint gel Y) y - fl * bxc 0.9960177 -4464.5809 19095881 -0.9086255 

Computed p/ • nrat gel X) y» c * Ax -r a 3 * dx 3 lax * ex 3 0.99176499 4.044686 -0.00114238 0.0000323 -0.00000455 0.00000000176 

Mouse gel Y « flrii gel y) y - 0 * Ax + or 3 J * rfy" Inx + 

ex/lnx 0.99951069 11861.44 678.91666 -0.78964914 1567 J 63 9 -6953.9592 

Mouse gel X « Aral pi JT) > - 0 Ax 3 Inx * ex" ♦ dx 3 0.99926349 58.935923 0.00091353 -0.000213688 0.00000159 

Rat gel Y - fi mouse gel Y) y-c+bJlnx^cx^ + dx* 0.99950032 69.740526 0.00050772 -0.000130392 0.00000116 

Rat gel X « ffmouse gel X) y « c - bx «• cx 3 Inx * ox 3 - 5 * ex 3 0.9992832 -198.07189 2.0899063 -0.000671191 0.000145189 -0.000000986 



y=a+bx+cx/l nx+d/x+e/x^ 1 .5) 




B 



y=a+bexp(-x/c) 
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150000 
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y=a+bx A 2lnx+cx A (2.5)+dx A 3 




2 
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Figure 2. Plots showing Tits of selected equations (continuous curves) to data on identified proteins (square symbols). (A) pi computed from 
sequence data versuj gel X position for identified spots in F344 rat liver, (B) M t computed from sequence data versus gel Y position for identified 
spots in F344 rat liver, (C) gel X position for spots in B6C3F1 mouse liver vrrnu X position in F3443 rat liver, for coelectrophoresing spots; (D) 
gel Y position for spots in B6C3F1 mouse liver versus Y position in F3443 rat liver, for coelectrophoresing spots. In each case, inverse equations 
were also computed (Table 2). 
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f/ywrr i. Master 2-D ge! pattern for B6C3F1 mouse liver, standardized using the F344 rat liver pattern identifications, according to the method 
described in the texi. Twenty-nine proteins are identified. 



P^fcATFLASMA — AaT LIVER X-pI (/lUT PLASMA * LIVER X -RAT LIVER X 

(/ratplasma x-rat plasma* lfver x (^raT plasma))) 

(8) 

This unified approach, in which one well-populated 2-D 
pattern is used to standardize a family of other patterns, 
has the additional advantage that the resulting pi and M f 
scales are directly compatible. Hence one can compare 
the relative pfs of mouse and rat versions of a se- 
quenced protein in a consistent p/ measurement system, 
and select likely inter-species analogs based on posi- 
tional relationships on common scales. Adoption of 
immobilized pH gradient (IPG) technology [4-7] will 
result in substantial improvements in p/ positional 
reproducibility for standard 2-D maps such as those pre- 
sented here; however, we believe that our approach will 
continue to be useful in establishing the empirical pH 
gradient actually achieved by such gels under given 
experimental conditions (temperature, urea concentra- 
tion, etc.), in relating patterns run on different IPG 
ranges and using different lots of IPG gels (between 
which some variation will persist). Development of 
rodent organ maps is a continuing effort in our laborato- 
ries [8-10], and results in regular additions of identified 
proteins. Those who wish to receive current rodent liver 
maps, with color annotations, should send a stamped 
self-addressed envelope to the first author. 
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ABSTRACT Analysis of cellular protein patterns by 
computer-aided 2-dimensional gel electrophoresis together 
with recent advances in protein sequence analysis have 
made poss.b e the establishment of comprehensive 
Z-dimensional gel protein databases that may link pro- 
tern and DNA information and that offer a global ap- 
proach to the study of the cell. Using the integrated ap- 
proach offered by 2-dimensional gel protein databases it 
is now possible to reveal phenotype specific protein (or 
proteins), to microsequence them, to search for homology 
with previously identified proteins, to clone the cDNAs 
^."US? P aniaI P rotein sequence to genes for which the 
lull DNA sequence and the chromosome location is 
known, and to study the regulatory properties and func- 
tion of groups of proteins that are coordinately expressed 
in a given biological process. Human 2-dimensional eel 
protein databases are becoming increasingly important in 
view of the concerted effort to map and sequence the en- 
tire genome. — Celis, J. E.; Rasmussen, H. H.; Leffers, 
H.; Madsen, P.; Honore, B.; Cesser, B.; Dejgaard, K. 
Vandekerckhove, J. Human cellular protein patterns and 
heir link to genome DNA sequence data: usefulness of 

FTcrirr-'^onF^'^^P" 0 "" 5 and microsequencing. 
FASEBJ. d: 2200-2208; 1991. 5 

A>) : Hbrds: human protein patterns • 2-dimensional gel protein 
databases • gene expression • microsequencing • cDNA clonmP 
• linking protein and DNA information • genome mapping and se- 
quencing * 

Proteins synthesized from information contained in the 
DNA orchestrate most cellular functions. The total number 
of proteins synthesized by a typical human cell is unknown 
although current estimates range from 3000 to 6000 Of 
these, as many as 70% may perform household functions 
and are expected to be shared by all cell tvpes irrespective of 
their origin. There are many different cell types in the hu- 
man body w,th perhaps 30,000 to 50,000 proteins expressed 
m the organism as a whole judged from the fact that about 
c 0 \.T ha P l01 ? genome correspond to genes. Todav only 
a small fraction of the total set of proteins has been identified 
and httJe ls known about the prolein pal|erns of individua] 

cell types or their variation under physiological and abnor- 
mal conditions. 

-I.™ '""J" PaSt I 5 y l arS ' h ' gh resolution 2-dimensional gel 
electrophoresis has been the technique of choice to deter- 
mine the protein composition of a given cell type and for 

MdnZ?, 8 nge f in activit y throu S h quantitative 

and qualitative analysis of the thousands of proteins that or- 
chestrate various cellular functions (refs 1-6 and references 



therein). The technique originallv described bv OTarrell i 
separates proteins in terms of their isoelectric point (pi) an 
molecular weight. Usually one chooses a condition of in- 
terest and the cell reveals the global protein behavioral 
response as all detected proteins can be analvzed both 
qualitatively and quantitativelv in relation to each other At 
present most available 2-dimensional gel techniques (regu- 
lar gel iormat) can resolve between 1000 and 2000 protean* 
trom a given mammalian cell tvpe. a number that cor- 
responds to about 2 million base pairs of coded DNA Le^ 
abundant proteins can be detected bv analvzing partiall 
purified cellular fractions. 

Two-dimensional gel ectrophoresis has been widelv applied 
to analysis of cellular protein patterns from bacteria'to mam- 
malian cells (refs 1-6. and references therein). In spite of 
much work, however, information gathered from these 
nudies has not reached the scientific community in its full- 
ness because of lack of standardized gel svstems and the lack 
o 1 means for storing and communicating protein informa- 
tion. Only recently, because of the development of appropri- 
ate computer software (7-13). has it been possible to scar 
gels assign numbers to individual proteins, and store tht 
wealth of information in quantitative and qualitative com- 
prehensive 2-dimensional gel protein databases (4 14-23) 
i.e.. those containing information about the various proper- 
ties (physical, chemical, biological, biochemical, phvsiologi- 
cal, genetic, immunological, architectural,, .etc.) of all the 
proteins that can be detected in a given cell type. Such in- 
tegrated 2-dimensional gel protein databases offer an easy 
and standardized medium in which to store and communi- 
cate protein information and provide a unique framework in 
which to focus a multidisciplinary approach to study the cell 
Once a protein is identified in the database, all of the infor- 
mation accumulated can be easily retrieved and made availa- 
ble to the researcher. In the long run, protein databases are 
expected to foster a wide variety of biological information 
that may be instrumental to researchers working in many 
areas of biology- among others, cancer and oncogene 
studies, differentiation, development, drug development and 
testing, genetic variation, and diagnosis of genetic and clini- 
cal diseases (Fig. 1). 

The approach using systematic 2-dimensional gel protein 
analysis has recently gained a new dimension with the ad- 
vent of techniques to microsequence major proteins recorded 
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Figure 1. Interface between partial protein sequence databases, 
comprehensive 2-dimensionai gel databases, and the human ge- 
nome sequencing project. Appropriate software is required to com- 
pare protein and DNA sequences. In general, although the infer- 
ence of a proteins sequence from the DNA sequence (thick arrow) 
is direct and unambiguous, the DNA sequence can only be inferred 
approximately from the protein sequence (thin arrow) and cloning 
)f the gene requires either a cDNA or the requisite group of 
jiigonucleotide probes deduced from the partial amino acid se- 
quence. Modified from ref 6. 



in the databases (refs 24-42 and references therein). Partial 
protein sequences can be used to search for protein identitv 
as well as to prepare specific DNA probes for cloning as-yet- 
uncharacterized proteins (Fig. 1). As these sequences can be 
stored in the database (see for example Fig. 2H), they offer 
i unique opportunity to link information on proteins with 
he existing or forthcoming DNA sequence data on the hu- 
man genome (Fig. I) (20. 36. 39). 

Using the integrated approach offered by comprehensive 
2-dimensionaI gel databases (Fig. 1), it will be possible to 
identify phenotype-specific proteins; microsequence them 
and store the information in the database: search for homol- 
ogy with previously characterized proteins; clone the 
cDNAs. assign partial protein sequences to genes for which 
the full DNA sequence and the chromosome location are 
known, and study the regulatory properties and function of 
groups of proteins (pathways, organelles, etc.) that are coor- 
dinated expressed in a given biological process. Comprehen- 
sive 2-dimensional gel protein databases will depict an in- 
tegrated picture of the expression levels and properties of the 
thousands of protein components of organelles, pathwavs. 
and cytoskeletal systems in both physiological and abnormal 
conditions and are expected to lead to identification of new- 
regulatory networks in different cell types and organisms. In 
the future, 2-dimensional gel protein databases mav be 
linked to each other as well as to national and international 
specialized databanks on nucleic acid and protein sequences, 
protein structures. NMR experimental data, complex carbo- 
hydrates, etc. 

A few 2-dimensional gel protein databases that are accessible 
in a computer form have been published in extenso: these 
correspond to the protein-gene database of Escherichia coli 
K-12 developed by Neidhardt and colleagues (14. 23), the rat 
REF 52 database established by Carrels and co-workers at 
Cold Spring Harbor (18. 22). and a few human databases 
(transformed amnion cells [15. 20], normal embryonal lung 
MRC-5 fibroblasts [17. 21]. keratinocytes [19] and peripheral 
blood mononuclear cells [15]) developed in Aarhus. Given 
space limitations and to keep this review in focus, we will 
concentrate on the computerized analysis of human cellular 
2-dimensional gel patterns, and in particular on the steps in- 
volved in establishing comprehensive 2-dimensional gel 
databases that can link protein and DNA information. 



MAKING AND MANAGING A COMPREHENSIVE 
2-DIMENSIONAL GEL DATABASE OF HUMAN* * 
CELLULAR PROTEINS 

The first step in making a comprehensive 2-dimensionai cc: 
protein database is to prepare a synthetic image (dicital ton:: 
of the gel image) of the gel (fluorogram. Coomassie blue or sil- 
ver stained gel) to be used as a standard or master reference. 
This can be done with laser scanners, charge couple device 
(CCD) 2 array scanners, television cameras, rotating drum 
scanners, and muhiwire chambers (13V Computerized anal- 
ysis systems for spot detection, quantitation, pattern match- 
ing, and data handling (access and retrieval of information, 
database making) have been described in the literature 
(ELSIE [43], GELLAB [II]. HERMeS (44]. MELANIE 
[10], QUEST (9), and TYCHO [8]) and some arc available 
commercially (PDQUEST. Protein Database Inc.. Hunting- 
ton, N.Y.; KEPLER, Large Scale Biology, Rockville. Md.; 
Visage, Biolmage Corporation, Ann Arbor, Mich.: Gemini. 
Joyce Loebl, Gateshead; Microscan 1000. Technologv 
Resources Inc., Nashville, Tenn. and MasterScan, Billerica. 
Mass.). Unfortunately, most of these systems are incompati- 
ble with one another and their advantages and disadvantages 
have been discussed by Miller (13). 

In our work station in Aarhus, fluorograms are scanned 
with a Molecular Dynamics laser scanner and the data are 
analyzed using the PDQUEST II software (Protein Data- 
bases Inc.) (12) running on a spark station computer 4100 
FC-8-P3 from SUN Microsystems, Inc. The scanner meas- 
ures intensity in the range of 0-2.0 absorbance. A typical 
scan of a 17 x 17 cm fluorogram takes about 2 min. Steps 
in image analysis include: initial smoothing, background 
substraction, final smoothing, spot detection, and fitting of 
ideal Gaussian distribution to spot centers. Spot intensity is 
calculated as the integration of a fitted Gaussian. If calibra- 
tion strips containing individual segments of a known 
amount of radioactivity are used, it is possible to merge mul- 
tiple exposures of the sample image into a single data image 
of greater dynamic range. Once the synthetic image is 
created it can be stored on disk and displayed directlv on the 
monitor. Functions that can be used to edit the images in- 
clude: cancel (for example, to erase scratches that may have 
been interpreted as spots by the computer; cancel streaks or 
low dpm spots), combine (sometimes a spot may be resolved 
into several closely packed spots), restore, uncombine, and 
add spot to the gel. The process is time consuming— about 
1-1/2 day per image. Edited standard images can be matched 
to other synthetic images. Figure 2A shows a portion of a 
standard synthetic image (IEF) of a fluorogram of 
[ 35 S]methionine labeled cellular proteins from human AM A 
cells (master database) (20). Images can be displayed either 
in black and white (resembling the original fluorograms) or 
in color (other images in Fig. 2), depending on the need. As 
shown in Fig. 2B y each polypeptide is assigned a number by 
the computer, which facilitates the entry and retrieval of 
qualitative and quantitative information for any given spot 
in the gel (20). The standard image can be matched auto- 
matically by the computer to other standard or reference gels 
(Fig. 2C matching of AMA cellular proteins [left] to MRCo 
proteins [right]) provided a few landmark spots are given 
manually as reference (indicated with a + in Fig. 2Q to in- 
itiate the process. 



^Abbreviations: CCD. charge couple device: PCNA. proliferat- 
ing cell nuclear antigen: HPLC. high performance liquid chromato^ 
raphy. 
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The automatic matching process that has been described 
in detail by Garrels et al. (12) takes about 5 min. Matched 
proteins are indicated with trie same letters in both gels (Fig. 
2C). The usefulness of this function is emphasized by the fact 
that data accumulated on common household* proteins can 
be easily transferred to any other human cellular cell type 
whose 2-dimensional gel cellular protein pattern is matched 



to our standard AMA 2-dimensional gel protein image. Al- 
ternatively, if the standard gel is part of a matchset (set of 
gels in a given experiment) it can be used as a linker gel to 
compare, for example, the quantitative values of a given pro- 
tein throughout the experiment (see Fig. 2D; levels of some 
proteins in normal and SV40 transformed human MRC-5 
fibroblasts) or with other standard images in different sets of 
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cross-matched experiments (18, 22). — 

Once a standard map of a given protein sample is made 
one can enter qualitative annotations to make a reference 
database Our master 2-dimensional gel database of trans- 
tormed human amnion cell (AMA) proteins (20) lists 3430 
polypeptides of which 2592 correspond to cellular compo- 
nents, having pi's ranging from 4 to 13 and molecular 
weights between 8.5 and 230 kDa. The most abundant pro- 
teins in the database correspond to total actin (3.87% of total 
protein; about 90 million molecules per cell) while the 
esser abundant of the recorded polypeptides are present in 
the vicinity of 5000 molecules per cell. Some annotation 
categories we are using to establish the master AMA data- 
base include: 1) protein identification (emigration with 
purified protons, 2-dimensional immunoblotting, microse- 
quencing); 2) amounts (total amounts and levels of synthe- 
sis); J) subcellular localization (nuclear, cytoskeletal, mem- 
brane membrane receptors, specific organelles, etc); 4) 
antibodies; 5) posttranslational modifications (phosphoryla- 
tion, glycosylation, methylation etc.); 6) microsequencinif 7) 
cell cycle specificity (specific variations in levels of synthesis 
and amount); 8) regulatory behavior (effect of hormones 
growth factors heat shock, etc.) P) rate D f synthesis in nor- 
ma] and transformed cells (proliferation sensitive proteins 
cell cycle specific proteins, oncogenes, components of the 
pathway (or pathways) that control cell proliferation); 10) 

Wm^T/T Y r° m co ? li S" tion with proteins of known 
Junction); 11) sets of proteins that are coordinated regulated 
(hierarchy of controls, differential gene expression in various 
cells etc.); 12) cDNAs (cloned cDNAs); 13) proteins that are 
specific to a given disease (systematic comparison of protein 
patterns of fibroblast proteins from healthy and diseased in- 
rW , } k 7 *> ex P ression and exploitation of transfected 
cDNAs; 75) pathways (metabolic, others); 16) gene localization 
(genetic and physical); 17) effect of microinjected antibody 
on patterns of protein synthesis; and 18) secreted proteins 
Information entered for any spot in a given annotation 
category can be easily retrieved by asking the computer to 
display the information on the color scrfen. For eLrnple! 
a m \ a u WS t *? nthetlc im *& of a NEPHGE gel (master 
AMA database displaying the information contained under 
the entry glycolytic pathway. Alternatively, one can use the 
Junction peruse annotations for spot to directly ask the com- 
puter to list all the entries available for a particular protein. 
By clicking the mouse in a given entry (in this case, presence 
in fetal human tissues) it is possible to take a quick look at 
tne information in that particular entry (Fig 2F) 

A major obstacle encountered in building comprehensive 
2-dimensional gel protein databases is identifying the laree 

ZTk" ^re"? sc P arated ^ this technology. In oGr 
databases (20, 21), known proteins are identified by one or 
a combination of the following procedures: 1) emigration 
with known proteins 2) 2-dimensional gel immunoblotting 
ell SPe "n C n ant, D bodies ' and 3) microsequencing of 

LT ^ o^' B UC ? taincd human P roteins ^red 
i^nr fi 2 - d ""««onal gels (see next section). Protein 
identification by means of microsequencing may be difficult, 
a individual protein members of families with short peptide^ 
differences may escape detection. In the gene-protein data- 
base of £. col, K-12 04, 23), another majorVdirnenslonJgel 

titer !T b 6 31 Pr u eSem ' Pro,eins are ^ i d ««ified by 
a wider range of tests that include comigration with purified 

nTsens 3 ; ^ ™ rion ( d ««™. -enion, fralshift, 
3?„ ' m,ssc " sc .' "-egulatory), plasmid-bearing strains 

tion nh . Syn ! hCS,S ° f Pr0tCin: SC,eCtive labelin g (methyla! 
.on, phosphorylation); peptide map similarity; and physio- 
logical criterion and selective derivatization * 



tnl 0 / 3 ! - !^ HaV ! reCeived nearl >' 550 antibodies from iabora- 
tested by 2-dimensional gel immunoblotting for antieen de- 

™TaT Similar,y ' Purified P~«ins g and organelle* 
^7 several laboratories have greadv aided identifica'- 
tion of unknown proteins (20r2l). We routinelv request anti- 

IS!!w™1i P |[ 0, - ei ? Samp,CS and pr0misc the donors 10 m ake 
available all the information we may have accumulated on that 

particular protein. For example, Table 1 lists entries availa- 
ble for Lipocort.n V (IEF SSP 8216), also known as annexin 
v, VAL.-a, endonexin II, renocortin, chromobindin-5' an- 

sisarcS- PAP " 1, rca,cin,cdm ' ibc - 

As mentioned previously, one distinct advantage of 
2-dimensional gel electrophoresis is the possibility of studv- 
mg quantitative variations in cellular protein patterns that 
™L a 10 ,dcm,ficat,on of ^oups of proteins that are ex- 

S£,.?° i Y dUring 3 given logical process, 
quantitation however, is not an easy task as reflected by the 
lack of published data on global cellular protein patterns. We 
believe this is partly due to difficulties in obtaining sets of 
gels that are suitable for computer analysis (streaking 

,7o a n w a V emam,n f. at ,he ori ^ n ' etc ) as w " « to limita- 
tions (laborious editing time, need of calibration strips to 
merge images, limited dynamic range, etc.) in the computer 
analysis systems available at the moment. Perhaps the most 
advanced quantitative studies published so far using com- 
puter analysis have been carried out by Garrels and co- 
workers (18, 22). In particular, these investigators have estab- 
ished a quantitative rat protein database (18, 22) designed 
to «udy growth control (proliferation, growth inhibitors and 
stimulation) and transformation in wdl-defined groups of 
ttU ines obtained by transformation of rat REF52 cells with 
bV40, adenovirus, and the Kirsten murine sarcoma virus. 
These studies have revealed clusters of proteins induced or 
repressed during growth to confluence as well « groups of 
transformation-sensitive proteins that respond in a differen- 
tial fashion to transformation by DNA and RNA viruses A 
most interesting feature of this quantitative database is ihe 
discovery of a group of coregulated proteins that show simi- 
Jar expression patterns as the cell cycle-regulated DNA repli- 

(PCNA^cTn |ST " PrOHferatinS Cdl nudCar 

In our human databases, most quantitations have been 
earned out by estimating the radioactivity contained in the 
polypeptides by direct counting of the gel pieces in a scintil- 
Ution counter (20, 21). Up to 700 proteins can be cut out 
through appropriate exposed films in a period of time com- 
parable to that required for editing a synthetic image. 
Manual quantitation of this large number of spots is difficult 
without the assistance of a master reference image and a 
numbering system that can be used to identify the spots Us- 
ing this approach, we have recorded quantitative changes in 
he relive abundance of 592 ["SJmethionine-labeled pro- 
teins synthesized by quiescent, proliferating, and SV40 
transformed human embryonic lung MRC-5 fibroblasts (21) 
borne data concerning cytoskeletal and cytoskeletal-related 
proteins are presented in Fig. 2G. Our studies as well as 
those of Garrels and co-workers (18, 22) may in the long run 
help define patterns of gene expression that are characteristic 
oi the transformed state. 

OTHER 2-DIMENSIONAL GEL PROTEIN 
DATABASES 



As mentioned previously there are other 2-dimensional gel 
databases available m computer form that have been pub- 
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TABLE 1. Some mines for UpocorUn V in the human AMA 2-dimensional gel protein database 



Entries for lipoconin V (IEF SSP 32 J 6 1 



Information entered 



1. Protein name 

2. Percentage of total protein 

:>. Apparent molecular weight (mr) 

4. Isoelectric point (pi) 

5. Method (or methods) of identification 

6. Credit to investigators that aided in 

identification 

7. Antibody against protein 

8. Comigration with human proteins 

9. Cellular localization 

10. Calcium/phospholipid-dependcnt 

membrane proteins 

1 1 . Function 



12. Partial amino acid sequence 

13. cDNA sequence 

14. Levels in fetal human tissues 



15. Levels in quiescent, proliferating, and 
transformed MRC-5 fibroblasts 

lb. Distribution in Triton supernatant and 
cvtoskclcions 



P4^"vaX' rCn 9 ? C0nir !- chromobindin-5'. endonexin I. anticoagulant protein 
rAK-I. \AC-a. 3D- 7 -caJcimedin. IBC. calphobindin I. anchorin CII. annexin V 

O.I10£ (about 2.800.000 molecules per cell) 

33.3 kDa 
4.76 

Microsequencing. 2-dimensionaJ immunoblotting. Comigration 

Binr FV J r V ™ d ' kcrc ^' c - and colleagues. Rijksuniversiteit Gent; B. Pem„<kv 
BIOGEN. Cambndge; N.G. Ann. University of Washington 

Polyclonal (rabbit, antibody no. 20). B. Pepinsky. BIOGEN. Cambridge 

Lipoconin V.N.G. Ahn. Howard Hughes Medical Institute. Washington L 

Subcortical membrane 

Lipoconin V 



niwrsitv 



Regulation of various aspects of inflammation, i 
and differentiation 



mmune response, blood coagulation 



Known. R. Blake et al.,/ Biol. Chan. 263. 10799-10811; 1988 
(pi - 4.76 from translated sequence) 

Adrenal glands - + + + : brain - + + + ; 

cerebellum - + + + ; ear « + + +; eye - + + + : 

hcan - + + + : hypophysis « + + + ; liver - + + + ; 

lung - + + + ; meninges - + + + ; 

mesonephric tissue - + + + ; 

striated muscle - + + + : pancreas . + + + : 

skin - + + + ; spleen - + + + ; stomach » + + + ; 

submandibular gland - + + + ; 

small intestine « + + + ; thymus = + + + ; 

thyroid gland « + + + ; tongue « + + + ; 

ureter « + + + 

Q (quiescent) « 1 . 1 ; P (proliferating) - 1.0; 
T (SV40 transformed) - 0.3 

Mainly supernatant 



lished in extenso: these correspond to the £. colt K-12 
protein-gene database (14. 23) and to the rat REF52 data- 
base (18. 22). 

The E. colt K-12 cellular protein-gene database is perhaps 
the most complete of all databases reported so far and even- 
tually it should trace each protein back to its structural gene. 
Information contained in this database includes: gene/pro- 
tcm name (protein name, EC number, gene name); 
"^dimensional gel spot designations (x-v coordinates from 
reference gels, alphanumeric designation); genetic informa- 
tion (linkage map location, physical map location, Genebank 
code, sequence reference, location on Kohara clones); bi- 
ochemical information (molecular weight, pi, number of 
residues of each amino acid, mole percent of each amino 
acid, total number of amino acids in a polypeptide) and 
regulatory information (cellular level of protein in different 
media and different temperature, member of reguton, mem- 
ber of stimulon). Major advances of this database are en- 
visaged in the future in view of the eminent sequencing of 



the whole E. colt genome as well as the development of im- 
proved methods to express cloned genes. 

The rat REF52 2-dimensional gel protein database lists 
about 1600 proteins that have been recorded using the 
QUEST analysis system (18, 22). Included in this quantita- 
tive database are /) protein names (cytoskeletal and heat 
shock proteins as well as various nuclear^ mitochondrial, and 
cytoplasmic proteins), 2) annotations (subcellular localiza- 
tion, modification, recognition by specific antibodies, 
coprecipitation, NH 2 -terminal sequence, cross-reference to 
protein sequence information and references to the litera- 
ture), 3) protein sets (cytoskeletal proteins, phosphoproteins. 
sets of proteins with PCNA/cyclin-like properties, etc.) and 
4) general quantitative data (protein synthesis during growth 
of normal REF52 cells to confluence and quiescence, and af- 
ter restimulation of growth-inhibited cells). 

In addition to the 2-dimensional gel databases mentioned 
so far there are several smaller cellular databases being es- 
tablished in human (normal human diploid fibroblasts, iym- 



phocytes, leukocytes, leukemic cells) mouse (NIH/3T3 cells, 
T lymphocytes), Aplysia. yeast {Saccharomyces cemisae), plants 
(wheat, barley, sorghum), and Euglena. Databases of tissue 
protein, (brain, whole mouse, liver) and body fluid proteins 
(plasma proteins, cerebrospinal fluid, urine, and milk) are 
being established in several laboratories. The reader is 
directed to the review by Celis et al. (4) for details and refer- 
ences concerning these databases. 



MICROSEQUENCING HAS ADDED A NEW 
DIMENSION TO COMPREHENSIVE 
2-DIMENSIONAL GEL DATABASES: A DIRECT 
LINK BETWEEN PROTEINS AND GENES 

The development of highly sensitive amino acid gas-phase or 
liquid-phase sequenators (24), together with the establish- 
ment of efficient protein and peptide sample preparation 
methods, has opened the possibility to perform a systematic 
sequence analysis of proteins resolved by 2-dimensional gel 
electrophoresis. Indeed, generated pieces of protein se- 
quences can be used to search for protein identity (compari- 
son with available sequences stored in databanks) as well as 
for preparing specific DNA probes for cloning of as yet un- 
characterized proteins (Fig. 1). In addition, partial protein 
sequences can be stored in 2-dimensional gel databases (for 
example, see Fig. 2H) and offer a unique link between pro- 
teins and genes (Fig. 1). 

In the early 1970s gel electrophoresis was used to purify 
proteins for sequencing purposes (reviewed by Weber and 
Osborn in ref 25). Proteins were recovered by diffusion and 
sequenced by the manual dansyl-Edman degradation at the 
nanomole level. This technique was further refined by using 
electro-elution to recover proteins and by miniaturizing the 
system (26). This method has been used extensively, but 
showed increasing drawbacks (low yields, protein samples 
contaminated by free amino acids, and NH 2 -terminal block- 
ing) as the amounts of handled protein gradually became 
smaller (e.g., at the 10 picomol level). 

Most of the problems referred to above have been 
minimized with the introduction of protein-electroblotting 
procedures (27-32). When proteins are blotted on chemi- 
cally inert membranes, it is possible to sequence the immobi- 
lized proteins directly without additional manipulations. 
Thus, depending on trie amount of bound protein and its na- 
ture, this direct sequencing procedure generally yields NH 2 - 
terminal sequences containing 10-40 residues. As such, this 
technique was used to identify, by their NH r terminal se- 
quences, differentially expressed major proteins from total 
cellular extracts separated on 2-dimensional gels. A major 
difficulty encountered in this procedure is the occurrence of 
frequent artefactual blockage of the proteins. Several studies 
suggest that this phenomenon is mainly due to reaction with 
contaminants (particularly unpolymerized acrylamide 
present in the gel) and to a high dilution of the protein (low 
concentration of the protein per unit membrane surface). In 
addition to this primarily technical problem, many proteins 
are blocked in vivo by acylation or by a pyrrolidon carboxylic 
acid cap. 

The problem of partial or complete NH 2 -terminal block- 
age can be circumvented by generating internal amino acid 
sequences. This is achieved by fragmenting the protein 
present in the gel (gei in situ cleavage) or by cleaving it while 
bound to the membrane (membrane in situ cleavage) 
(33-35). In both cases, proteins are either cleaved in a res- 
tricted way (e.g., by limited enzymatic digestion or by using 
restriction chemical cleavage conditions) or fragmented into 
smaller peptides. 
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Of the different combinations examined, we had good 
results by using exhaustive proteolytic digestion^ on 
membrane-immobilized proteins. This method has been 
described for Ponceau red-stained proteins on nitroeellulost 
blots (34), for Amido-black^stained Immobilon-bound pn 
teins. and for fluorescamine^detected proteins on glass fib-, 
membranes (35). The proteases used (trypsin, chymotrypsir.. 
or pepsin) cleave at multiple sites, generating small peptides 
that elute from the blot into the digestion buffer from which 
they are purified by reversed- phase high performance liquid 
chromatography (HPLC) before being sequenced individu- 
ally. Although each of these manipulations could be expected 
to result in a reduced yield of final sequence information, we 
were surprised that the peptides could be sequenced with 
high efficiency. In our hands, this approach could be rou- 
tinely applied to gel-purified proteins available in amount> 
ranging from 5 to 10 /zg, and often yielded sequence informa- 
tion covering more than 30% of the total protein. As 
membrane-immobilized proteins are not homogeneouslv 
digested, but rather show protease sensitivity next to resis- 
tant regions, the number of peptides generated is much lower 
than expected from the number of potential cleavage sites. 
Consequently HPLC peptide chromatograms are less com- 
plex and most peptides can be recovered in pure form. 

As only limited amounts of a protein mixture can be 
loaded on a 2-dimensional gel proteins of interest are often 
obtained in yields insufficient for the currently available se- 
quencing technology. More material can be obtained bv en- 
riching for a certain subcellular fraction (purified cell or- 
ganelles) or by exploiting affinity (dyes, metals, drugs, etc) or 
hydrophobic properties of proteins before gel analysis. All of 
the sequencing results accumulated so far in the human pro- 
tein database (20) (a few are shown in Fig. 2H) have been 
obtained from analysis of protein spots collected from 
2-dimensional gels that had been stained with Coomassie 
blue according to standard procedures and dried for storage. 
Proteins are recovered from the collected gel pieces by a 
protein-elution-concentration device, combined with gel 
electrophoresis and electroblotting. Details of this technique 
have been reported in a previous communication (42) and a 
brief outline is given below. 

Combined gel pieces are allowed to swell in gel sample 
buffer (a total volume of 1.5 ml). The gel pieces combined 
with the supernatant are then collected into a large slot made 
in a new gel. The slot is further filled with Sephadex G-10 
equilibrated in gel sample buffer. During consecutive gel 
electrophoresis, most of the electrical current passes on the 
side of the slot instead of passing through the slot. This 
results in both a vertical stacking and horizontal contraction 
of the protein band. With this device the protein is efficiently 
eluted from the gel pieces and concentrated from a large 
volume into a narrow spot. The highly concentrated (about 
5 mm 2 ) protein spot is then electroblotted on PVDF- 
membranes, stained with Amido black, and in situ digested 
with trypsin. The peptides generated during digestion elute 
from the membrane into the supernatant, and can be sepa- 
rated by narrow bore reversed-phase HPLC and collected in- 
dividually for sequence analysis. 

Using this and previous procedures (37, 39, 42), we have 
so far analyzed 70 protein spots collected from 
2 -dimensional gels (20, and unpublished observations) (see 
for example Fig. 2H). The sequence information amounts to 
2100 allocated residues corresponding to an average of 30 
residues per protein spot. So far we have made cDNAs of 
many of the unknown proteins that have been microse- 
quenced, and a substantial number has been cloned and se- 
quenced. All available information indicates that it may be 
possible to obtain partial sequence information from most of 
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the proteins that can be visualized by Coomassie Brillant 
Blue staining. 

Partial protein sequences are stored in the database as dis- 
played in Fig. 2H. and it should be possible in the near fu- 
ture to interface this information with forthcoming DNA se- 
quence data from the human genome project. In the long 
run, as the human genome sequences become available it 
will be possible to assign partial protein sequences to genes 
:or which the full DNA sequence and chromosomal location 
are known (Fig. 1). 



SUMMARY 

The studies presented in this brief review are intended to 
demonstrate the usefulness of computer-aided 2-dimensional 
gel electrophoresis and microsequencing to analyze cellular 
protein patterns, and to link protein and DNA information. 
As more information is gathered worldwide, comprehensive 
databases will depict an integrated picture of the expression 
ievels and properties of the thousands of proteins that orches- 
trate most cellular functions. 

Clearly, databases allow easy access to a large body of data 
and provide an efficient medium to communicate stan- 
dardized protein information. In the future, databases will 
foster a wide variety of biological information that can be 
used to support collaborative research projects in basic and 
applied biology as well as in clinical research (2, 5. 46). Once 
a protein is identified in a particular database all the infor- 
mation gathered on it can be made available to the scientist. 
However, many problems must be solved before protein 
databases become of general use to the scientific communitv. 
A most urgent one is to promote standardization of the gel 
running conditions so that data produced in a given labora- 
tory may be used worldwide. Surprisingly, the gel running 
technology as it stands today is still a craftmanship an. 

Finally, comprehensive, computerized databases of pro- 
teins, together with recently developed techniques to 
microsequence proteins, offer a new dimension to the studv 
of genome organization and function (Fig. 1). In particular, 
human protein databases may become increasingly impor- 
tant in view of the concerted effort to map and sequence the 
entire human genome. This formidable task is expected to 
dominate biological research in the next decades. [£j] 
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Preparation of human tumors for analysis by :-D electrophoresis 1045 

Nonenzymatic extraction of cells from clinical tumor 
material for analysis of gene expression by two- 
dimensional polyacrylamide gel electrophoresis 

We have compared different methods of preparation of malignant cells for 
two-dimensional electrophoresis (2-DE). We found all methods usine fresh 
tissue to be superior compared to methods using frozen tissue. Our Results 
indicate that nonenzymatic methods of preparation of tumor cells, including 
fine needle aspiration, scraping and squeezing, have advantages over methods 
using enzymatic extraction of cells. Nonenzymatic methodsVe rapid, appear 
to reduce loss of high molecular protein species, and alleviate the necessity of 
separating viable and nonviable cells by Percoll gradient centrifugation. Using 
these techniques, high-quality 2-DE maps were derived from tumors of the 
lung and breast. In the resulting polypeptide patterns, heat shock proteins, 
non-muscle tropomyosins and intermediate filament were identified. We con- 
clude that nonenzymatic extraction of malignant cells from fresh tumor tissue 
improves the possibilities that these techniques mav be useful in clinical diag- 
nosis. ' * 



1 Introduction 

Tumors may develop by a number of different mechan- 
isms in any given cell type. At the time of diagnosis, 
tumors will have progressed along different pathways to* 
various stages of malignancy. To provide a basis for indi- 
vidual therapy it is of importance to examine specific 
properties of the tumor cell population in each patient. 
A large number of different markers have been de- 
scribed in order to increase the diagnostic accuracy. It is 
likely thai a combination of serveral markers is needed 
in the future in order to reflect different properties of 
the tumor. One important method for the resolution of a 
large number of potential markers is two-dimensional 
electrophoresis (2-DE). Extensive efforts are being made 
in identifying various polypeptides separated by 2-DE 
and to characterize how the expression of these polypep- 
tides is affected by the response to cellular transforma- 
tion and various culture conditions [1.2]. It would be of 
value to transfer this information to 2-DE separations of 
polypeptides from tumor tissue samples. However, one 
prerequisite is that the quality of the 2-DE gels from 
tumor samples is comparable in quality with 2-DE eels 
from samples of cultured cells. 

Frozen tumor tissues are commonly used for various bio- 
chemical assessments. However, if "such samples are ana- 
lyzed by 2-D polyacrylamide gel electrophoresis (PAGE), 
the polypeptide patterns are obscured by contamination 
of serum- and connective tissue proteins. Such nontu- 
mor-cell-related variations represent serious problems in 
the interpretation and inter-patient comparison of 2-DE 
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patterns [3]. 2-DE patterns of cells prepared from fresh 
tumor material were analyzed after enzymatic extraction 
of tumor cells [4, 5] or after culturing tumor fragments in 
medium containing radioactive amino acids [6]. These 
procedures may, however, lead to alterations in the gene 
expression/polypeptide patterns. We are only aware of 
one study where nonenzymatic extraction of cells from 
fresh tumor tissue (prostate cancer) was used to prepare 
samples for 2-D PAGE [4]. We have examined enzymatic 
extraction and various nonenzymatic preparation tech- 
niques, including fine needle aspiration, for the prepara- 
tion of cells from fresh tumor tissues. We describe 
nonenzymatic extraction procedures that are rapid, lead 
to high-quality 2-DE patterns, and that alleviate the 
necessity to purify tumor cell populations from dead 
cells. 



2 Materials and methods 

2.1 Cell cultures and samples used for spot 
identification 

A rat embryonal fibroblast cell line, WT2 (a kind gift 
from Dr. J. I. Garreis and Dr. S. Pattersson) was used for 
the identification of a number of heat shock and struc- 
tural proteins. Human normal diploid lung fibroblasts, 
WI38. human epithelial breast carcinoma cells, MDA- 
231 and MCF-7 were purchased from ATCC and grown 
as recommended. Polypeptides prepared from a leu- 
kemia type pre-B-ALL were separated by 2-DE. The 
2-DE map was then analyzed by Dr. S. M. Hanash (Uni- 
versity of Michigan. Ann Arbor, USA). 

2.2 Tumor tissues samples 

In this study, 2-DE maps from seven tumors were used 
as representative illustrations: two adenocarcinoma of 
the lung (LA, and LB, mucinous, both cases interme- 
diate grade of differentiation), one sqamous carcinoma 
of the lung (LS), one carcinoid-like breast cancer (BC), 
one microfollicular^ adenoma (highly differentiated) of 
the thyroid (TA), one highly differentiated hyperneph- 
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roma. a tumor of the kidney (KH). and finally one case 
of poorly differentiated corpus carcinoma (CP). 

2.3 Preparation of cultured cells 

The cell monolayers were washed twice in phosphate 
buffered saline (PBS) and then scraped off in ice-cold 
PBS including protease inhibitors (PIH). phenylmethyl- 
sulfonyl fluoride (PMSF) 0.2 itim and 0.83 mM benzami- 
dine pelleted at 660 X 3 min (+4°C) and washed one 
lime before final centrifugation at 2700 X g. 5 min. The 
wet weight of the cell pellet was recorded and the cells 
were stored at — 80°C until further processing. 

2.4 Preparation of tumor tissue samples 

2.4.1 General remarks 

Macroscopically representative and non-necrotic tumor 
tissues were selected within 20 min after resection. 
Parallel samples were routinely prepared for cytology. 
The samples were processed as rapidly as possible on ice 
or at +4"C and in the presence of PIH. Cells were 
stained with DifTQuick (Baxter) and usually examined at 
three different occasions during the preparation proce- 
dure: (i) cytology sample, (ii) extracted cells and (iii) 
cells after percoll gradient centrifugation. 

2.4.2 Specimen acquisition 

The strategy of sample preparation is shown in Fig. 1. 
Tumor tissue cell samples were usually obtained by fine 
needle aspiration (NA) using a 0.7 mm needle. The 
syringe was filled with 1—2 mL of ice-cold culture med- 
ium/PIH. We found that if a tumor appeared to be very 
fibrous it is difficult to extract enough cells for 2-DE 
analysis. In these cases, two alternative techniques were 
examined, (i) The tumor was cut in the middle and the 
fresh surface scraped (SO by a scalpel. The cell-rich 
material was then transferred to ice-cold culture 
medium (LIS with 5% fetal calf serum)/PIH. (ii) A part 
of the tumor sample was placed in culture medium on 
ice for further processing at the laboratory in the fol- 
lowing way: the material was cut into very' small frag- 
ments on a pre-cooled dissection plate and transferred 
to a small glass chamber with a 0.7 mm metal net 5 mm 
above the bottom of the chamber. Medium /PIH was 
added to cover the sample (8 mL) which was gently 
squeezed (SQ) towards the net in order to release and 
wash out cells. NA and SC were also compared with an 
enzymatic extraction (EE) procedure described previ- 
ously [5]: Briefly, thin slices of tissue were incubated 
with collagenase (1 mg/mL) and elastase (2 mg/mL) in 
medium for 1 h at 37°C. Extracted cells from even' 
sample were then subjected to percoll gradient centrifu- 
gation (Section 3.2.3). 

2.4.3 Separation of cells by Percoll gradient 
centrifugation 

The cell suspension was filtered through two nylon mesh 
filters, (i) 250 \im and (ii) 100 pm and then centrifuged 



at 660 X o f or 3 rnj n . The cell pellet was resuspend^ 
carefully in medium, usini: a syringe and loaded onto 
two-step discontinuous Percoll/PBS gradient. 20.4 
(density = 1.03 g/mLi and 547 u o (density = 1.0* g/mLi. 
and centrifuged at 1000 X v for 15 min. In this system, 
dead cells stay on the top. viable cells sediment to the 
interphase and erythrocytes sediment to the bottom. The 
viability of cells in the top fraction and interphase was 
checked by the trypan blue exclusion test. The inter- 
phase cell layer (> 90°i» viability) was collected and 
washed one time in a large volume PBS/PIH (centri- 
fuged at 800 X i; for 3 min). Finally, the cells were resus- 
pended in 1.4 mL PBS and pelleted at 2700 X y for 5 
min. The wet weight (WWi was recorded and the pellet 
was then stored at -80 C 

2.4.4 Final preparation of cells for 2-D PAGE analysis 

From this point, cultured cell samples were treated 
in the same way as tumor cell samples: Each cell pellet 
was thawed on ice and rcsuspended in 1.S9 pL mQ water 
per mg WW <= 1.89 x WW > pL. The suspension was 
frozen and thawed 4-5 X to break the cells |7). A 
volume of (0.089 X WW) pL 10% sodium dodccyl 
sulfate (SDS). including 33.3 n --<> mcrcaptoethanol. was 
mixed with the sample and incubated 5 min on ice with 
(0.329 X WW) nl_ of a solution of DNasc I (0.144 
mg/mL 20 niM Tris-llCl with 2 m\i CAC1 : X 2H : 0. pH 
8.8) and RNase A (0.0718 mg/mL Tris) |8.9). The sample 
was frozen and lyophilized. Sample buffer [10] including 
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Figure /. Experimental flow chari showing main steps of ihe prepara- 
tion procedures. The abbreviations used for nonenzymalic extraction 
procedures are: FZ; frozen sample preparation: NA. needle aspira- 
tion: SC. scraped: and SQ. squeezed sample. Extracted cells are then 
loaded as a suspension (top volume of each tube) onto either 
1.07 g/mL Percoll (left), or a discontinuous Percoll gradient from the 
nonenzymalic extraction (middle), or from enzymatic extraction 
(right). Cellular top- and interphase fractions are then used for 2-DE. 
For details see Section 2. 
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PMSF (0.2 msi, EDTA (1.0 mM ), 0.5 °o Nonidet P-40 
(NP-40), and 3-[3-cholamido propyl )-dimethvIammonio]- 
I-propane sulfonate (CHAPS; 25 mM) was added care- 
fully, mixed for 2.5 h and centrifuged for 15 min at 
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10000 rpm to remove any insoluble material. Duplicate 
or triplicate samples were taken for protein determina- 
tion [11]. Samples were stored at -80'C prior to isoelec- 
tric focusing (IEF). 
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2.4.5 Preparation of frozen tumor tissue 

The technique has been described previously [3.12]. 
Briefly, the sample is moaned frozen to a fine powder, 
homogenized, lyophilized and solubilized in sample 
buffer. 

2.4.6 Control of representativity 

The tumors were examined routinely by experienced 
pathologists and smears or imprints from the samples 
were also assessed for cytometric DNA content by 
microspectrophotometry. 

2.5 2-D PAGE 

2-D PAGE was performed as described [8,10] except for 
the following details. The glass tubes for IEF, 1.2 X 200 
mm. contained 2.0°/b Resolyte, pH 4-8 (BDH) and were 
cast to a height of 180 mm. A stock solution of acryl- 
amide (Serva) and A'.A^-methylenebisacrylamide (16.7:1 
for IEF and 37.5:1 for the second dimension) was deio- 
nized by mixing with 5% w/v Duolite MB 5313 mixed- 
resin ion exchanger (BDH) for 30 min. filtered (with a 
0.22 \im nitrocellulose filter) and stored at -70°C. 
A T .A r '-Methylenebisacrylamide, A r ,A'.A\N'-tetramethyleth- 
ylenediamine (TEMED) and ammonium persulfate were 
purchased from Bio-Rad. IEF tubes were prefocused at 
200 V in 60 min. To each tube a sample corresponding to 
20—40 ^g protein was applied and focused for 14.5 h at 
800 V and finally 1.0 h at 1000 V using a Protean II cell 
(Bio-Rad) and Model 1000/500 Power Supply (Bio-Rad). 
The lube gels were finally extruded into 1.25 mL equili- 
bration buffer, containing 60 mM Tris, pH 6.8 (2% SDS, 
100 mM dithiothreitol and 10°/b glycerol), frozen on dry 
ice and stored at -70°C. The second dimension (1.0 X 
180 X 90 mm) of the acrylamide concentration was 10% 



• — 

T. and the gel contained 376 niv Tris. pH S.S. and 0.1 1 
SDS. IEF gels were applied on top of the slab gel. seaied 
with 0.5% agarose containing electrophoresis running 
buffer (60 mM Tris-base. 0.2 m glycine and 0.1 l Y SDS) 
and electrophoresed with 10-11 mA per gel (constant 
current) at + 10°C. Six gels were run together in a Pro- 
tean II xi 2-D Multi-Cell (Bio-Rad). Proteins were visual- 
ized bv silver stainins and photographed with the acidic 
side to the left [13J4]. 

2.6 Identification of polypeptides 

Vimentin and vimentin-derived polypeptides were identi- 
fied by extraction of an MDA-231 cell lysate with 0.6 m 
KCl/0.5% NP-40 [15]. Tropomyosins were exctracted 
from MDA-231 and WI38 cell lysates [16]. and cytokera- 
tins were extracted from MDA-231 and MCF-7 cell 
lysates [17]. The patterns were compared with published 
maps [19-21]. Proliferating cell nuclear antigen (PCNA) 
was identified by immunoblotting (PC 10 mAB. Dako- 
patt) using a semidry system (Multiphor II Nova Blot. 
Pharmacia-LKB Biotechnology AB) and enhanced che- 
moluminescence (ECL) detection (Amcrsham). 

3 Results 

3.1 2— DE of samples prepared from normal and 
tumorigenic cultured cells 

The object of this study was to develop methods for pre- 
paration of 2-DE maps from human tumor tissue which 
have the same high resolution as those obtained from 
cultured cells. Shown in Fig. 2 are high resolution 2-DE 
gels prepared from cultured cells and one leukemia: 
SV40 transformed embryonal rat fibroblasts WT2 (Fig. 
2a); human MDA-231 breast carcinoma cells (Fig. 2b); 
human WI38 fibroblasts (Fig. 2c) and human pre B-ALL 
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Figure 3. 2-DE analysis of a case of lung adenocarcinoma (LA). Comparison of 2-DE gel quality between <A> frozen and (B) fresh (needle 
aspiration) tissue preparation. 
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cells (Fig. 2d). Polypeptides were identified through a 
laboratory exchange of cell samples/2-DE maps "and 
through 2-DE analysis of purified proteins (Table 1). 

3.2 Preparation of samples from solid tumors 
3.2.1 Fresh versus frozen tissue 

An adenocarcinoma of the lung (LA) was prepared for 
2-DE by conventional methods using frozen material 
(Fig. 3a). There are several possibilities for the poor reso- 
lution using frozen tissue, including the presence of high 
molecular weight protein aggregates. Filtering extracts 
through 0.1 urn filters (Durapore. Millipore) resulted in 
a slightly improved resolution (not shown). When fresh 
tumor tissue from tumor LA was used for sample prepa- 
ration, using fine needle aspiration to collect the cells, 
the resolution was considerably improved (Fig. 3b). The 
use of fresh tissue resulted in a general increase in reso- 
lution, which was most pronounced in the 50—100 kDa 
molecular mass range. A number of differences in the 
protein profiles of the gels in Figs. 3a and 3b can be ob- 
served, some of which are indicated in the figures. The 
decrease in serum albumin in Fig. 3b is likely to result 
from loss of serum proteins occurring when cells were 
pelleted after aspiration. Other differences, such as the 
decreased level of transformation-sensitive tropomyosins 
(TM1-TM3). may result from enrichment of tumor cells 
in the sample of Fig. 3b. Fine needle aspiration, a well- 
established technique in cytology, extracts mainly tumor 
cells because of decreased intercellular adhesiveness of 
neoplastic cells as compared to normal tissue. Micros- 
copic examination of DifT-Quick-stained extracted cells 
from case LA revealed almost 100% tumor cells, 
whereas the whole tissue extract contained approximate- 
ly 60°h tumor cells. 



Table 1. Names and abbrevia tions for identified spots 
Spot Name 



Basis tor idenuiijjno: 



A Actins a 

aA a/p/m-Actinin a 

B23 Protein B23 /Numatrin a 

EF2 Elongation factor 2 a 

EFI Elongation factor 1 & a 

GT Giutathione-S-transpherase {pi a 

hsp60 Heat shock protein 60 a 

hsp73 Heat shock protein 73 a 

hsp80 Heat shock protein 80. GRP78. BIP a 

hsp90 Heat shock protein 90 a 

hsplOO Heat shock protein 100. Endoplasmin a 

I Fa Intermediary filament associated a 

k8 Cytokeratin 8 b and a 

LamB Lamin B a 

Lip 1 Lipocortin I a 

Lip2 Lipocortin II a 

Lip5 Lipocortin V a 

Mitl Mitcon 1/0 - Fl ATPase a 

Mit2 Mitcon 2 a 

Mit3 Mitcon 3 a 

MRP Mucine Related Polypeptides 

pcna Ploliferating cell nuclear antigen c and a 

PLC Phospholipase C (1) a 

RO RO/SS-A antigen a 

SA Serum Albumin b and a 

aT a/p/iflTubulin a 

bT betha-TubuYin a 

tml Non-muscle tropomyosin isoform 1 b and a 

tm2 Non-muscle tropomyosin isoferm 2 b and a 

tm3 Non-muscle tropomyosin isoferm 3 b and a 

tm4 Non-muscle tropomyosin isoform 4 b and a 

tm5 Non-muscle tropomyosin isoform 5 b and a 

TPI Triose phosphate isomerase a 

V Vimentin b and a 

VidI Vimentin derived protein b and a 

Vid2 Vimentin derived protein b and a 

Vid3 Vimentin derived protein b and a 

Vid4 Vimentin derived protein b and a 

Vin Vincuiin a 

a. homologous position with respect to other mammalian systems 

b. purified protein(s) 

c. immunobiotting 
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Figure 4. 2-DE analysis of a case of breast carcinoma (BC). Comparison of 2-DE quality and some differences in detected spots (arrow 
heads indicate increased intensity and circles or bracket indicate decreased intensity of the same spots) between i\ \ enzymatically and fB) 

nonenzymatically i scraped) tissue preparation. 
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3.2.2 Comparison of different methods for preparing 
cells from fresh tumor tissue 

Samples were prepared from breast and lung carcinomas 
using either an enzymatic treatment with collagenase/ 
elastase or using nonenzymatic preparations (Fig. 4). A 
number of differences in the protein profiles were ob- 
served in the resulting 2-DE gels, some of which are 
indicated in Figs. 4a and b. These differences include 
both increases and decreases in spot intensity. These dif- 
ferences may result from degradation of high molecular 
weight polypeptides during enzymatic treatment, in- 
creased solubilization of polypeptides, or may have other 
causes. For many tumors, it was only possible to obtain 



small amounts of material since they were reserved tor 
other examinations. In these cases, samples could be pre- 
pared for 2-DE using either needle aspiration or 
scraping. Figure 5a shows a 2-DE gel prepared from 
squamous lung carcinoma (LS) cells collected by needle 
aspiration and Fig. 5b shows a gel prepared from the 
same tumor by scraping. In this case, a number of differ- 
ences were recorded between the two procedures, some 
of which are arrowed in Fig. 5. Samples obtained from 
other tumors (breast and lung) generally showed fewer 
differences between these two methods of cell sampling 
(not shown). These data show that different nonenzy- 
matic extraction procedures may yield different polypep- 
tide patterns. However, the number of spots with a large 
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Figures. 2-DE analysis of a case of lung cancer (LS). Comparison of 2-DE gel quality and detected spots (arrow heads and circles) between 
(A) aspirated (needle aspiration) and (B) scraped preparations from fresh tissue. 




Fi?un 6. 2-DE analysis of three other types of tumors. (A) hypernephroma. (Bi an adenoma of the thyroid and <Ci corpus cancer, using the 
nonenzymatic preparation technique. Arrowheads and circles indicate some cytosolic polypeptides. 
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difference in intensity were lower than when a nonenzy- 
matic preparation was compared with an enzymatic pre- 
paration. 

2-DE maps of satisfactory quality were prepared by a 
third procedure. Cells were released from small pieces of 
tumor by squeezing (see Section 2). Some examples of 
this are shown in Fig. 6 where 2-DE maps derived from 
a case of hypernephroma. KH (Fig. 6a), a case of thyroid 
tumor. TA (Fig. 6b) and a case of corpus cancer, CP (Fig. 
6c) can be seen. We conclude thai nonenzymatic tech- 
niques are useful for 2-DE analysis of a number of dif- 
ferent tumors. The quality of the resulting gels is com- 



parable to that obtained using cultured cells (compare 
the gels in Fig. 2 with those in Fig. 4. 6 and 7). Which of 
these methods will be optimal will, in our experience, 
depend on the tumor material. For example, very small 
tumors are preferably extracted by squeezing: on the 
other hand, breast cancers (which are often fibrous) 
yield satisfactory samples using scraping. 

3.23 Purification of cells on percoll gradients 

We considered the possible advantage of separating 
viable cells from dead cells, erythrocytes, and debris 
using discontinuous Percoll gradients. Cells collected 





FiMurc * 2-DE analysis of polypeptides from viable lb and d) and nonviable (a and c) cells of an adenocarcinoma of the lung (LB) 
separated using discontinuous Percoll density gradient. Nonenzymatic preparation technique (a and b> and enzvmatic preparation 
technique ic and d) arc compared. 
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from the interphase showed a viability of more than 
90% as judged by trypan blue exclusion test. However, it 
as found that the yield of viable cells decreased drama- 
tically if the tissue resection was not immediatelv pro- 
cessed. To study the effect of lysis of cells during the pre- 
paration procedure. 2-DE maps were prepared from 
nonenzymatically extracted cells of case LB collected 
from the top fraction (nonviable. Fig. 7a) and interphase 
fraction (viable. Fig. 7b). These 2-DE maps were 
compared with corresponding fractions (nonviable. Fig. 
7c. and viable. Fig. 7d) of enzymaiically extracted cells. 
One clear disadvantage of the enzymatic technique was 
that when loss of cell viability occurred during prepara- 
tion, a dramatic loss of high molecular weight polypep- 
tides was observed (Fig. 7c). This was probably due to 
degradation of intracellular proteins. However, nonenzy- 
matic preparations showed fewer differences between 
viable and nonviable ceils: The most pronounced altera- 
tion was a decrease of a group of mucine related pro- 
teins (Fig. 7b). We conclude, therefore, that disconti- 
nuous Percoll gradient is necessary after enzymatic 
extraction of cells, but can be omitted from the nonenzy- 
matical tumor sample preparation procedure. 

We used the MDA-231 cell line to study the efTects of 
cell lysis and leakage of cytosolic polypeptides during 
sample preparation. Remarkably, after 30, 50. 80 and 140 
min of incubation in PBS/PIH at 0"C no significant 
changes were observed in the 2-DE pattern (not shown). 
Although loss of cell viability may not result in protein 
degradation when cells are incubated in the presence of 
protease inhibitors, loss of cytosolic proteins would be 
expected during pelleting of cells. We monitored the loss 
of lactate dehydrogenase (LDH) activity into the super- 
natant during incubation in PBS of MDA-231 and MCF- 
7 breast cancer cells at 20'C. In both cases, loss of via- 
bility was paralleled by release of LDH from the cells 
(Fig. 8). After 5 h. 70 (, o of the MCF-7 cells, but only 30% 
of the MDA-231 ceils were dead (not shown). 
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Figure S. The relative release ■ fraction in supernatant of total) of lac- 
tate dehydrogenase acitivity (LDH) and cella viability versus incuba- 
tion time of the mammary carcinoma cell lines MDA-231 and MCF-7 
during incubation in PBS at 20"C. 




These data indicate the impact of a rapid preparation 
procedure, at low temperature, of fresh tumor samples. 
Experiments have also been performed using oniv 
1.07 g/mL Percoll (Fig. 6c and Fig. 1. left test tube) in 
order to remove erythrocytes. One clear advantage with 
this procedure, which today is routinely utilized, is a 
higher yield of viable cells, probably due to decreased 
sample preparation time. 



4 Discussion 

We describe procedures for sample preparation from 
solid tumors for 2-DE. 2-DE maps could be derived 
from solid tumors which were similar in quality to those 
obtained from cultured cells. Compared to methods 
using frozen material, the resolving power of the 2-DE 
technique is increased, allowing examination of a large 
number of polypeptides from tumors of different malig- 
nancies. Other investigators (12.22] have used samples 
from frozen tumors to derive 2-DE maps. We have previ- 
ously described disadvantages encountered using frozen 
tumor samples including variations in contaminating pro- 
teins between different samples [3]. The methods de- 
scribed here are based on the preparation of cells from 
tumors without enzymatic digestion. The enzymatic step 
could be avoided since malignant cells usually grow as 
solid masses which are not strongly attached to the 
matrix. Furthermore, we found that omitting the enzy- 
matic digestion alleviated the necessity of purifying 
viable tumor cells on Percoll gradients. This was in sharp 
contrast to enzymaiically treated samples, where loss of 
viability leads to loss of high molecular weight proteins 
(Fig. 7c). 

At least in the case of lung cancer, viable and nonviable 
cells showed small differences in respect to 2-DE maps. 
Presumably, protease inhibitors penetrate cells and 
inhibit proteolysis. In model experiments, we observed 
leakage of cytosolic protein (LDH) from the cells in 
parallel to loss of viability. Apparently, however, only a 
limited decrease of the level of low molecular weight 
cytosolic polypeptides was detected using silver staining 
combined with visual inspection. We have found that 
although some tumors are well suited for the prepara- 
tion procedure described, others are not. In general, 
good results were obtained using tumors of the lung, 
breast, corpus and lymphomas. In contrast, cells from 
thyroid adenomas and hypernephroma showed poor via- 
bility. We were in these cases unable to separate nonvi- 
able cells from viable cells, and we can therefore not 
evaluate the consequence of the loss of viability on 
2-DE patterns, apart from a loss of some low molecular 
weight cytosolic polypeptides. 

Highly differentiated tumors may show lower viability as 
compared with poorly differentiated tumors (Dr. Farkas 
Vanky, personal communication). A number of samples 
from thyroid tumors were prepared for 2-DE but most 
cases showed poor viability. We believe that special care 
is needed during preparation of generally highly differen- 
tiated tumor groups. The difference between loss of via- 
bility/leakage of LDH of the more differentiated MCF-7 
cells and the less differentiated MDA-231 cells is in line 
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with these observations (Fig. 8). A number of potential 
and interesting markers, like tropomyosin isoforms. cyto- 
keratins and heat shock proteins, appear to be insensi- 
tive to loss of viability during the preparation procedure. 
We have to date made numerous observations of altera- 
tions in the expression of these polypeptides in breast 
cancers and lung cancers. 

Another problem that may occur, irrespective of sample 
preparation techniques used, is admixture of lympho- 
cytes. These cases are easily detectable in smears and it 
may therefore be possible to select lymphocyte specific 
spots as "internal markers" for the 2-D PAGE analysis. 
Studies using this approach are in progress. Many of the 
polypeptides identified are structural (table 1). Since the 
expression of many of these polypeptides are known to 
vary between normal and malignant cells, the possibility 
to determine their expression simultaneously is 
appealing. In the specific case of breast cancer, altera- 
tions in the expression of intermediate filament proteins 
(cytokeratins) are known to occur during tumor progres- 
sion [23]. Other proteins known to be differentially 
expressed between normal cells and transformed cells 
arc tropomyosins. numairin/B23. heat shock proteins 
and PCNA. To this end. we have observed alterations in 
the expression of cytokeratin 8. hsp 90. and non-muscle 
tropomyosin isoform 2 during malignant progression. 
(Okuzawa et <//., in preparation and Franzcn et at., in pre- 
paration). 

The method of choice for sample preparation from 
tumor tissues will depend on the properties of the tumor 
material studied. It may be important to use only one 
method when comparing cases wiihin one group, as dif- 
ferences were observed between methods. The advan- 
tages of the nonenzymatic techniques are (i) that it mini- 
mizes contamination with connective tissue, (ii) that 
problems with contamination of serum proteins are 
avoided, and (iii) that separation of viable and dead cells 
is not necessary. Hereby the revolving power of 2-D 
PAGE is maximized for the analysis of human tumors 
and studies on intcr-iumor variations in gene expression 
are facilitated. In addition, the polypeptide patterns ob- 
tained may be more representative for the /// vivo tumor 
cell since the use of enzymes and incubations have been 
minimized. 

lie would like to thank Dr. J. A Garrets. Dr. S. Patrersson. 
Dr. S. \t. Hanash and Dr. J. E. Ce/is for making sample 
and J-DE map exchanges possible. This study was sup- 
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Reference points for comparisons of two-dimensional 
maps of proteins from different human cell types 
defined in a pH scale where isoelectric points correlate 
with polypeptide compositions 

A highly reproducible, commercial and nonlinear, wide-range immobilized pH 
gradient (IPG) was used to generate two-dimensional (2-D) gel maps of 
I"S)methionine-labeled proteins from noncultured. unfractionaied normal 
human epidermal keratinocytes. Forty one proteins, common to most human 
cell types and recorded in the human keratinocyte 2-D gel protein database 
were identified in the 2-D gel maps and their isoelectric points {pf) were deter- 
mined using narrow-range IPGs. The latter established a pH scale thai 
allowed comparisons between 2-D gel maps generated either with other IPGs 
in the first dimension or with different human protein samples. Of the 41 pro- 
teins identified, a subset of 18 was defined as suitable to evaluate the correla- 
tion between calculated and experimental p/ values for polypeptides with 
known composition. The variance calculated for the discrepancies between cal- 
culated and experimental p/ values for these proteins was 0.001 pH units. 
Comparison of the values by the Mest for dependent samples (paired test) 
gave a p-level of 0.49. indicating that there is no significant difference between 
the calculated and experimental pi values. The precision of the calculated 
values depended on the buffer capacity of the proteins, and on average, it 
improved with increased buffer capacity. As shown here, the widely available 
information on protein sequences cannot, a priori, be assumed to be sufficient 
for calculating pi values because post-translational modifications, in particular 
A-terminal blockage, pose a major problem. Of the 36 proteins analyzed in 
this study. 18-20 were found to be ;V-terminal!y blocked and of these only 6 
were indicated as such in databases. The probability of A-terminal blockage 
depended on the nature of the A-terminal group. Twenty six of the proteins 
had either M. S or A as A-terminal amino acids and of these 17-19 were 
blocked. Only 1 in 10 proteins containing other A-terminal groups were 
blocked. 



1 Introduction 

As compared with carrier ampholyte isoelectric focusing 
(CA-1EF). the application of immobilized pH gradients 
(IPGs* in the first dimension in 2-D gel electrophoresis 
offers improved reproducibility |1] because the nature of 
the pH gradient makes the resulting focusing positions 
insensitive to the focusing lime (2] and to the type of 
sample applied [3]. The recently introduced ready-made 
IPG strips [4] seem to be an ideal substitute for the car- 
rier ampholyte gradients, which until now have been the 
most commonly used first dimensions in 2-D gel electro- 
phoresis. The availability of standardized first dimen- 
sions opens the possibility of comparing 2-D gel maps of 
various cell types generated in different laboratories, pro- 
vided that the focusing positions of a number of easily 
recognizable polypeptide spots common to the cell types 
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in question are known. Even though this approach is 
limited to experiments performed wiih the same standar- 
dized IPG. the flexibility provided by IPGs allows the 
pH gradient to be adjusted to the requirements of a par- 
ticular experiment. 

Exchange and communication of 2-D gel protein data re- 
quires a pH scale that is independent of the particular 
IPG used and by which the results can be described. The 
introduction of carbamylaiion trains and the relation of 
focusing positions to the spots in these trains repre- 
sented a step forward towards solving the reproducibility 
problem experienced with carrier ampholyte focusing (5). 
Problems associated with the use of carbamylaiion trains 
were mainly due to lack of temperature control and to 
the use of nonequilibrium focusing conditions. Accord- 
ingly, the pattern variation involved not only the re- 
sulting pH gradients, but also the relative spot positions 
as related to each other and to spots in the carbamyla- 
tion trains. Even though the question of reproducibility 
has. to a large extent, been solved, the carbamylation 
trains are still not ideal as markers because the spots in 
the trains do not represent defined entities but rather a 
large number of differently carbamylated peptides 
having close pi values. As a result, the spots are large 
and poorly defined as compared to the ordinary polypep- 
tide spots in 2-D gel maps. 
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Neidhardt etai [6] defined the pH gradient in 2-D gel 
experiments by p/ markers whose p/ values were calcu- 
lated from the amino acid composition. Focusing posi- 
tions of other polypeptides could be predicted from their 
composition but the pA' values needed for the p/ calcula- 
tions were unknown. Various groups employing this 
approach do not use the same pK values (6. 7] and there- 
fore, the pi values derived in this way cannot be 
expected to describe the variation of the hydrogen ion 
activity. In spite of this fact, it is still possible to make 
approximate predictions of focusing positions because 
the pK values used to define the pH gradient are also 
used to calculate p/ values and to predict the focusing 
positions. Errors in pK assignments are therefore com- 
pensated. A pH scale which corretly reflects the variation 
in hydrogen ion activity during focusing should improve 
the precision of the predictions, but this has never been 
implemented with CA-IEF focusing as a first dimension 
in 2-D gel electrophoresis. The main reason for this are 
the problems associated with pH measurements in 
focused gels containing high concentrations of urea. 

IPGs can be described from the concentration variation 
of the immobilized groups, provided that the pK values 
of these groups are known for the conditions prevailing 
during focusing. To avoid measurements on gels, Gia- 
nazza etai [8] suggested the use of p K values derived by 
addition of determined pA' shifts. Recently, direct deter- 
minations of pK differences between immobilized 
groups in IPGs were made by determining p/-pA values 
in overlapping narrow-range IPGs [9, 10] and the results 
verified the applicability of the Gianazza approach. A 
description of the focusing results in a pH scale, which 
correctly describes the variation of the hydrogen ion 
activity for the focusing conditions used, not only allows 
the comparison of 2-D gel maps generated with difTerent 
IPGs, but also opens the possibility for correlating the 
focusing position of a polypeptide with its composition 
[9J. Experiments by Bjellqvist etai [9. 10] have implied 
that pH scales showing good correlation between calcu- 
lated and experimental pi values can be derived for any 
of the conditions commonly used for focusing in connec- 
tion with 2-D gel electrophoresis. These pH scales are 
then defined through the pK values of the immobilized 
groups in the IPG containing gel. To be useful for inter- 
laboratory comparisons, however, the pH scale has to be 
defined through pi values of easily recognizable spots 
present in the 2-D gel map. So far, pi determinations in 
a useful pH scale, combined with determinations of pK 
values needed for pi calculations, have only been made 
for the pH range 4.5-6.5 at 10°C [9]. CA-IEF focusing as 
described by OTarrell [11] does not control the tempera- 
ture of the first dimension, which can be expected to be 
slightly above room temperature. With IPGs, the temper- 
ature commonly used is about 20°C [4, 12] or 25°C [13] 
and this is a critical parameter that needs to be con- 
trolled [14]. 

The present work was designed to compare 2-D gel maps 
of difTerent cell types in a laboratory applying both 
CA-IEF and IPG focusing at a common temperature. To 
this end we have generated 2-D gel maps of proteins 
from noncultured, unfractionated normal human epi- 
dermal keratinocytes with IPG in the first dimension 




and a focusing temperature of 25C. We ha\? usee com- 
mercial nonlinear, wide-range IPG strips which zwt :-D 
gel maps that are closely similar to the ones resuuir.i: 
with the CA-IEF technique used to establish the human 
keratinocyte database [15 J. As an initial step towards 
interlaboratory comparisons of results obtained with the 
nonlinear gradient as a first dimension we report here 
on the focusing positions of 41 known proteins that are 
common to most human cell types. The pH range 
covered corresponds to the range in classical CA-IEF 
2-D gel electrophoresis and in order to use these pro- 
teins as internal standards for comparing 2-D gel maps 
generated with other IPGs we determined their pi values 
with narrow-range IPGs in the first dimension. We have 
compared the calculated versus experimental pi values 
and show that it is necessary to have further information 
(absence or presence and nature of posttranslational 
modifications), in addition to amino acid composition to 
be able to calculate pi values thai correspond to the 
actual experimental values. The pA' values used for the 
calculations are provided and the usefulness of p/ predic- 
tion in relation to database information is discussed. 
Furthermore, we comment on the possibility of using 
experimentally determined pi values to verify the avail- 
able database information on polypeptide composition. 



2 Materials and methods 

2.1 Apparatus and chemicals 

Equipment for isoelectric focusing and horizontal SDS 
electrophoresis (Multiphor v II electrophoresis chamber, 
Immobiline v strip tray. Multidrive XL programmable 
power supply. Macrodrive power supply and Multhemp* 
II) was from Pharmacia LKB Biotechnology AB 
(Uppsala. Sweden). Vertical second-dimensional gels 
were run in the home-made equipment described in [15]. 
The IPG strips with the wide-range nonlinear pH gra- 
dient were either Immobiiine DryStrip v pH 3—10 NL, 
180 mm or alternatively 160 mm long IPG strips with a 
corresponding pH gradient. In both cases the IPG strips 
were delivered by Pharmacia LKB. Immobiiine, Pharma- 
lyte. Ampholine. GelBond as well as PAG film and the 
ready-made horizontal SDS gels (ExcelGeP XL SDS 
12—14) were also from Pharmacia LKB. Purified proteins 
and peptides were from Sigma (St. Louis, MO). 

2.2 Sample preparation 

Preparation and labeling of unfractionated keratinocytes 
as well as fibroblasts have been described in [16]. Cells 
were lysed in a solution containing 9.8 m urea, 2% w/v 
NP-40. 100 mM DTT and 2% v/v Ampholine pH 7-9. 

2.3 2-D gel electrophoresis 

First-dimensional focusing was performed according to 
Gorg etai. [2] with some minor modifications, as de- 
scribed in [9J. Rehydration of the IPG strips was made 
in a solution containing 9.8 m urea, 2% w/v CHAPS, 10 
mM DTT and 2% v/v carrier ampholyte mixture. The car- 
rier ampholyte mixture consisted of 2 parts Pharmalyte 
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4-6.5. 1 pan Ampholine pH 6-8 and 1 pan Pharmalyte 
pH 8-10.5. Usually, caihodic sample application was 
used and the samples were diluted 2-20 times in a solu- 
tion containing 9.8 m urea. 4&g w/v CHAPS. l°o w/v 
DTT and 35 m.M Tris base. For acidic application, the 
Tris-base was substituted with 100 mst acetic acid. The 
degree of dilution and sample volume (20—100 uL> 
depended on the particular sample and the IPG. and 
whether visualization of the proteins was to be done by 
Coomassie Brilliant Blue or silver staining. Wiih the 
wide-range non-linear IPG. 10-30 ug of total protein 
was loaded for silver staining and 100-200 ug for Coo- 
massie staining. Focusing was done overnight with Vh 
products in the range of 45-60 kVh with 160 mm long 
strips and 50-70 kVh with 180 mm long strips. Solubili- 
zation of polypeptides and blocking of-SH groups prior 
to the second-dimensional run, as well as loading on the 
second-dimensional gel was done as described in [9]. 
The stacking gel was omitted and 5-10 mm were left at 
the top of the second-dimensional gel for applying the 
IPG strip. The space was filled with electrode buffer con- 
taining 0.5 °o w/v agarose. Casting, running, staining and 
autoradiography were carried out as described in [15]. 

2.4 Experimental determination of pi values 

The determination of the pA' differences between Immo- 
bilincs pA' 4.6. pA* 6.2 and pA' 7.0 necessary for the cali- 
bration of the pH scale at 25 C in 9.8 m urea was done 
as described in [9] with the same narrow-range IPGs. 
The pH scale was defined by setting the pA' value of 
Immohilinc pA* 4.6 equal to 4.61 [9] and the determined 
pA' differences cave the pA' values of Immobilines pA' 6.2 
and pA M). equal to 5.73 and 6.54. respectively. The pA' 
differences found arc in good agreement with values de- 
rived from [17] and [8] by extrapolation to 9.8 m urea 
concentration. As in [9]. additional narrow-ranee recipes 
have been used for determining p/ values. With narrow- 
ranee Il*Gs extending to pH values higher than the pA' 
value of Immobiline pA" 7.0. anodic sample application 
was u*cd with acetic acid added to the sample solution. 
Otherwise, caihodic sample application was used with 
the Name sample buffer as for wide-range IPGs. 

2.5 Protein compositions used for pi calculations 

With the exception of vimcntin. protein compositions 
are from the Swiss-Prot database [18]. For vimentin. we 
used the data from [19]. where the amino acid at posi- 
tion 41 is a D instead of a S. Information in the Swiss- 
Prot database on phosphorylation has been disregarded 
because it was known from earlier studies (J. E. Celis, 
unpublished results) that the spots in question corre- 
sponded to the unphosphoryiated forms of the peptides. 



different substituents on the c-carbon were taken into 
account. The calculations of pi values were made \w:n 
the aid of the IPG-maker program [20]. 

2.7 pK values used for pi calculations 

For the carboxyi terminal group and internal glutamyl 
and aspartyl residues the same pA' values were used as in 
[9J. For C-terminal glutamyl and aspartyl residues, sep- 
arate pA* values were derived with the aid of the Taft 
equations [9. 21]. The pA' values of histidyl croups were 
calculated from the p/ values of human carbonic anhy- 
drase I as in [9]. For A-terminal glycine a pA' value of 
7.50 was used. The pA' shift caused by a substituent on 
the c-carbon was assumed to be identical with the pA 
shift the substituent caused for the amino group in the 
amino acid. i.e. 2.28 pH units were subtracted from the 
pA' values for the amino groups in the amino acids given 
in [22. 23]. The approximate pA* value of 9 for the cys- 
tenyl group was taken from [24]. For tyrosyl and arginyl 
groups we used the pA' values for the amino acids [22. 
23]. For lysyl groups the effect of high urea concentra- 
tion on amino groups was taken into account and 0.5 pH 
units were subtracted from the amino acid pA' value. 
These last three pA' values are far from the pH range 
under study and the results found would have been the 
same if lysyl and arginyl groups were assumed to be 
fully ionized while the ionization of tyrosyl groups were 
neglected. A complete list of the pA' values used is given 
in Table 1. 



Table I. pA' Values used for the iomzable groups in peptides 
9.8 m urea. 25 °C 



Iomzable 


PA' 


group 




("-terminal 


3.55* 


v.iermtnal 




A1j 


-.50 


Met 


-.00 


Ser 




Pro 




Thr 




Vat 


* u 


Glu 


7.70 


Internal 




Asp 


4.05 


Glu 


4.45 


Hts 


5.98 


Cvs 


u 


Tyr 


10 


Lys 


10 


Arg 


12 


C-ierminal side chain groups 




ASP 


4.55 


Glu 


4.75 



2.6 Calculation of pi values 

For the p/ calculations it was assumed that the same pA' 
value could be used for an amino acid residue in all 
polypeptides and in all positions in the peptide except 
for A- or C-terminally placed amino acids. For the pA' 
values of the A'-terminal amino groups the effect of the 



2.8 Statistical analysis 

Statistical comparisons of the experimental and calcu- 
lated p/ values were done on an Apple Macintosh Ilsi 
using the statistical package Statistica/Mac. release 3.0b 
(from StatSoft Inc.. Tulsa. Oklahoma). Calculated and 
experimental p/ values were compared by the /-test for 
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correlated samples (paired r-test). The normality of pi 
differences was estimated graphically by probability 
plots. The variances of the data presented here and the 
similar data on plasma and liver proteins in [9] were 
compared by the F-test. 

3 Results and discussion 

3.1 Identification of polypeptides and pi determinations 

The 2-D gel maps of [ 3i S]methionine-labeled proteins 
from noncultured. unfractionated normal human kerati- 



nocytes. focused with the nonlinear, uide-ran;:? I PCS .i-r.j 
CA-IEF pH gradients in the first dimension, are >rvu : *. 
in Figs. 1 and 2. respectively. The IPG exier.c> to r.i^r.c- 
pH values but otherwise the two patterns are \;:> >;rv- 
ilar and most of the spots in the IPG pattern car. 
directly related to the corresponding spots in :r.^ 
CA-IEF gel. To obtain comparable patterns it uas impor- 
tant to keep the focusing temperature as similar a> 
possible. Compared to other studies (1—4. 9. 10. 12- u;. 
we increased the urea concentration in the focusing gel 
to 9.8 m because keratins streaked badly in the focusing 
dimension when 8 m urea was used, presumably due to 



IEF- 



Q 

1 



39 



28 




■ *— 



20' 





37-^ 



t 

34 



V 



38 



♦ 

23 



-90 



-55 



-43 



-30 



6 



27- 



? 

14 



-16 



31 



^19 



12. 



7 



f/ei//r /. T-D eel protein map of | J? S)methionine-labeled proteins from noncultured. unfractionated normal human keratinocytes focused with 
the nonlinear, wide-range IPG in the first dimension. The position of the 41 proteins analyzed in this study is indicated. 
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aggregates of acidic and basic keratins. An increase in 
urea concentration to 9 m or more eliminated these 
streaks: apart from this effect, no other major changes in 
the focusing positions were observed. In Fig. 1 we have 
indicated the positions of 41 known proteins from the 
human keratinocyte 2-D gel database that are most 
likely common to most human cell types. The choice 
was made because these proteins are easy to identify 
with certainty. With the exception of straiifin (spot 2). 
involucrin (spot 4) and keratin 14 (spot 15). which are all 



epithelial markers, these proteins are also presen; m 
human fibroblasts (Fig. 3) and lymphocytes (results no: 
shown), and therefore can be used as landmarks for com- 
paring 2-D gel maps derived from different cell types. In 
Table 2 the 41 proteins are listed together with the:: 
sample spot numbers (SSP) in the human keratinocyte 
protein database and p/ values determined in 2-D gel 
maps generated with narrow-range IPGs in the first 
dimension. 




Future \ :-D uel proiein map of [" S|meihiontne-labeled proiems from noncultured. unfractionaied normal human keratinoevtes focused wiih 
CA-IEF in the first dimension. The position of ihe 41 proteins analyzed in this study is indicated. 
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3.2 Comparison between the determined and calculated 
p/ values for human keratinocyte proteins 

Thirty six of the 41 proteins listed in Table 2 are found 
in the Swiss-Prot database. Contrary to the plasma and 
liver proteins used in [9J. the pi calcuations on the pro- 
teins used in this study posed some problems that 
reflected the way in which they were characterized. The 

IEF— * 




Reference poinu for comp^Pfs of ;-D #e: -jr. ; ; ; 

proteins used by Bjellqvist et aL [9] were -uhe- 
abundant and well-characterized plasma proteins or thei 
were identified by A-terminal sequencing and. therefore 
the nature of the A'-terminals (acetylated or non-acet\- 
lated) was in both cases known. The proteins used in 
this study have all been characterized by internal 
sequencing [7] and it is known that .V-terminal aceivia- 
tion occurs with high frequency in eukaryoies. 
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According to Brown and Robert [25]. proteins with acety- 
lated A-terminals correspond in weight to approximately 
80% of the soluble protein in ascites cells. Based on 
results from A'-terminal sequencing, at least 40°/o of the 
spots in the human liver protein 2-D gel map appear to 
be blocked [3]. The corresponding number, derived from 
107 spots in the 2-D gel map of human T-lymphocyte 
proteins, falls between 60 and 65% (J. Strahler. personal 
communication). Information concerning A'-terminal 
blockage is not normally available, and in the Swiss-Prot 
database only 6 of the 36 keratinocyte proteins are speci- 
fied as A'-terminally blocked. We have, within the present 
material, defined 18 proteins for which the A-terminals 
are very likely to be correctly described. Six of these pro- 
teins are listed in the Swiss-Prot database as A'-termi- 
nally blocked, four represent proteins which appear in 
the human liver 2-D gel map and have been A-iermi- 
nally sequenced as liver proteins [3] and the remaining 
eight have A-terminal groups other than M. S and A. i.e. 
V-terminals for which A'-acetylation is uncommon (26). 
In Figs. 4 A. B. C and D p/ values calculated from Swiss 
Prot database information are plotted against the experi- 



mentally determined pi values for all the ksrji;r.oj.:c 
proteins listed in Table 2 and for the IS selected pro- 
teins, as well as for the plasma and liver protein* 
from [9] valid for 10 °C)*. 

The calculations show that without knowledge of the 
status of the \-terminaI group, precise predictions of p/ 
values for eukaryotic proteins cannot be achieved based 
on the information available in Swiss-Prot and similar 
databases. However, for proteins where the A'-terminal 
status is known, we find good correlation between pre- 
dicted and experimental pi values. When the variance of 
the pi discrepancies and the variance of calculated 
charges at the experimental pi values derived from the 
present data set are compared with the corresponding 



• There are lour plots: iA> the 3c» polypeptides from normal human 
keratinocytes (no corrections). (B) the 5e> polv peptides from F»c - \ 
where p/ values have been recalculated lor \2 polypeptides with M. 
S and A as .V-iermtnaily assumed blocked, based on calculated 
charge. (C) the 18 selected polypeptides with information on the 
Vierminal configuration, and (D) plasma and liver proteins. 



Finure 4. Calculated vs. experimental p/ values. Lines are fitted using the least squares' criterion. (A) 36 polypeptides from normal human kerati- 
nocytes (no corrections). (B> 36 polypeptides from Fig. 4A (including the IS marker polypeptides) where p/ values have been recalculated 
assuming A-terminal blockage: x indicates recalculated p/ values: nucleolar protein B23 is indicated with an arrow. (C) 18 polypeptides with infor- 
mation on V-terminal configuration and (D) plasma and liver proteins. 
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• alues derived from the data on plasma and liver pro- 
:eins in [9] (Table 3i. the present data are found to result 
:n larger variances for the values of both p/ discrepancies 
and calculated charge at the experimental p/ value when 
no information on posttranslationaJ modification is 
:aken into consideration. Correction for possible .V-acety- 
iation of 12 polypeptides with M. S and A as .V-terminal 
results in a smaller variance of p/ discrepancies, al- 
though not significantly different from values derived 
from [9]. whereas the variance of the calculated charge at 
:he experimental p/ value is significantly higher. For the 
18 selected proteins the variance for the p/ discrepancies 
s significantly smaller than for the data in [9J: however. 
:he corresponding value for calculated charge at the 
ixperimental p/ value does not improve to the same 
extent. This, we believe, reflects another difference 
between the two sets of proteins used for the calcula- 
:ions. Based on spot distributions in 2-D gel maps, the 
set of proteins used here has a molecular weight distri- 
oution that is more representative of the patterns ob- 
served in mammalian cells. In the studv bv Bjellqvist 
?/*/. [9] most of the high molecular weight plasma pro- 
eins had to be excluded due to their unknown content 
)f sialic acid which made the proteins analyzed in this 
;tudy heavily biased towards low molecular weight pro- 
ems. The butter capacity of proteins normally increases 
viih the protein's molecular weight, and the average 
DufTer capacity of the presently selected proteins with 
issumed known .V-ierminals is 18 charge units/pH unit, 
vhiie the corresponding value for the proteins used in 
9] is only 9 charge units/pH unit. High buffer capacity 
;an be expected to improve the agreement between cai- 
:uiated and experimental p/ values. Inspection of the 
iata presented in Table 2 for the polypeptides with 
issumed known A-ierminals verifies the importance of 
he buiTer capacity. For 8 polypeptides having buffer 
:apacities higher than 15 charge units/pH unit, the calcu- 
aiions in all cases yielded p/ discrepancies with absolute 
alucs of less than 0.02 pH units. The largest discre- 
iancy. 0.06 pH units, was observed for annexin II and 
laihmin. proteins which have lo* buffer capacity: 0.9 



and 6.6 charge uniis/pH unit, respectively. The proba- 
bility that the focusing position of a protein with known 
composition will fall within a certain distance from the 
calculated p/ value therefore cannot be predicted by the 
variance alone. The buffer capacity of the specific protein 
must be taken into consideration as well. As indicated 
by the decrease of the variance of calculated charges at 
the experimental p/ value for the selected proteins, the 
observed improvement can not solely be due to the 
higher buffer capacity of the keratinocyte proteins. The 
two studies relate to different experimental conditions. 
Good agreement between experimental and calculated 
p/ values implies that the proteins are defolded and a 
factor that may contribute to the observed improvement 
is a more complete defolding of proteins caused by the 
higher temperature and urea concentration used in this 
study. 

The data indicated that the precision with which p/ 
values can be predicted for polypeptides with high buffer 
capacity is better than the precision with which experi- 
mental pi values can be determined. If the pH is defined 
through the pA' values of the immobilized groups in the 
IPG containing gel. the precision of the experimentally 
calculated data will depend on the pH difference 
between the p/ and the pA' value of the immobilized 
group with the closest pA'. For the present study this will 
give p/ determinations with a precision varying in the 
range of ± 0.02-0.05 pH units [9]. The good* agreement 
observed between the calculated and experimental p/ 
values is due to the fact that errors are mainly system- 
atic and. as discussed in [9], they will largely be cancelled 
out in the calculations. A pH scale defined through the 
presently determined p/ values will not necessarily 
reflect the variation of the hydrogen ion activity during 
the focusing step in an optima) way. but it still allows 
precise predictions of focusing positions for polypeptides 
with known compositions, including information on 
posttranslational modifications. Calculated net charge at 
the experimentally found isoelectric point defined in this 
scale will serve as a tool to verify that the polypeptide 



"able 3. Mean values and variances lor the dillcrcnce (experimental p/-calculated pf) in pH uniis and calculated charges ai the experimental p/ 
\aiucs. respectively 

Plasma and liver Keratinocyte protetns 

P'oteins ,9.8 M urcj _ 2$ V C) 

<8 m urea. I0"C) 



All peptides Ail .peptides alter Known \-terminal 

correction for configuration tor 

A»aceiylauon very likely configuration I 



♦ umbe: 01 proteins 




29 






36 


36 




18 


-xpenmental p/- 
aicuiaicc p/ 


Mean 
-0.01 1 




Variance 

0.005 


Mean 

0.072 


Variance 

0.017 


Mean Variance 
0.019 0.003 


Mean 

0.005 


Variance 

0.001 


-vaiue tp/ discrepancy )" 




1 






3.4 


1.67 




5 


-level (p/ discrepancy )*' 
.alcuiated charge at the 
xoenmintal p/ value 


-0.070 


0.5 


0.227 


0.321 


0.0005 

0.871 


0.0721 

0.009 0.444 


-0.014 


0.0004 

0.109 


•value (calculated charge 

i :he experimental p/ value)"" 




1 






3.8 


1.96 




2.08 


-level calculated charge 

: the sxpenmental p/ value) 0 ' 




0.5 






0.0002 


0.0338 




0.0536 



> Comparison to the data in |9|. f * S ; V$, : . where S, : is ihe larger of the two variances 
) Pifiv.. v : i £ Avaluei. where r. and » : ire the degrees of freedom for i, and j 2 , respectively 
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composition used in the calculation is correct and com- 
plete. Exceptions to this are proteins such as involucrin 
and heat shock protein 90 that have very high buffer 
capacities. Introduction of an extra charge unit into 
these proteins will only result in pi shifts falling in the 
range of 0.01-0.02 pH units and the effect is that the 
quality of the pH definition — the precision by which pA' 
values used in the calculations are given and the preci- 
sion of experimental p/ values in these cases — will limit 
the possibilities to verify polypeptide compostion based 
on the experimental p/ value. 

Statistical comparison of experimental and calculated p/ 
values was done using the /-test for dependent samples 
and normality of the discrepancies was estimated by 
probability plots. For the 36 proteins, the p-level is 
0.0021. indicating that a result like this is unlikely to 
be a chance effect and must be assumed to represent a 
real difference. After correction for the most likeh 
A'-terminai configuration, the p-level is 0.043 and cannot 
be accepted as representing the same population since 
the p-level is less than 0.05 — the traditional ^-limit of 
statistical significance. For the 18 proteins with a known 
or very likely A'-terminal configuration the Mest gave a 
p-level of 0.49. which verifies that the experimental and 
calculated p/ values are not significantly different. 

Besides showing that p/ values for denatured proteins 
with known compositions can be calculated with a high 
degree of precision from average pA' values, the results 
also provide strong support for the notion that 
A'-terminal blockage heavily depends on the nature of 
the A'-terminal groups [261- The results seem to indicate 
that with A'-terminals other than M. S and A. only a few 
proteins have blocked A'-terminals (1 out of 10 proteins 
in the present study), while it can be inferred from the 
data presented in Table 2 that a majority of the proteins 
with M. S and A as A'-terminal are blocked. After correc- 
tion for the effect of suspected A'-terminal blockage 
there is only one protein (nucleolar protein B23) out of 
the 36 used in this study, which, in spite of a high buffer 
capacity, has a marked difference of 0.11 pH units 
between predicted and determined p/ values (Fig. 4B); 
this corresponds to 3 charge units due to the high buffer 
capacity of this protein. This discrepancy in p/ prediction 
and calculation of net charge at the p/ is probably not 
due to deficiencies in the database information but 
instead reflects a shortcoming of the model used for p/ 
calculations. Nucleolar protein B23 contains a domain 
extremely rich in aspartic and glutamic acid residues 
(Table 4). in which 26 out of 28 amino acid residues 
from position 161 to 188 are either a D or an E. A calcu- 
lation based on the use of average pA' values unin- 
fluenced by the charged neighboring amino acid resi- 
dues cannot be expected to correctly describe the p/ 
value with almost half of the acidic groups packed 



Table 4. Amino acid sequence of nucleolar phosphoprotein 023 




together into a highly negatively charged reevr. Tr.> 
limitation caused by calculations based on average r.\ 
values does not severely limit the usefulness o:" :r.: 
approach since a search through Swiss-Prot snow< 
this type of D/E-rich motif is uncommon, and :ne e\;>- 
tence of a highly charged region is immediately apparerv. 
upon inspection of the amino acid sequence. 

The quality of the information available in databases, 
especially concerning posttranslational modifications, is 
a major problem when the data is to be used for p/ pre- 
dictions. The p-level of 0.043 found for all 36 proteins 
after correction for A'-acetylation. shows that this prob- 
lem is not only limited to A-terminal blockage and the 
very good agreement found lor the eighteen pm> pep- 
tides, with assumingly correctly described A'-terminal 
(Fig. 4C). must be regarded as an exception from this 
point of view. A'-Terminal blockage is generally the main 
problem in relation to p/ predictions for eukaryotic pro- 
teins. Of the 36 keratinocyte proteins analyzed. 1 S— 20 
are suspected to be A'-terminally blocked it proteins blo- 
cked according to Swiss-Prou 12 proteins with M. S or A 
as A-terminal and assumingly blocked based on the cal- 
culated charge, and two proteins, involucrin and 
nucleolar protein B23. with M as A'-terminal for which 
the data does not allow any conclusion). This is in rea- 
sonable agreement with the conclusions based on the 
A'-terminal sequencing data derived in connection with 
2-D gel electrophoresis. A'-terminal blockage can be sus- 
pected for 17-19 of the 26 proteins with M. S or A as 
A-terminal. while only 1 in 10 proteins with other 
A'-terminal groups are blocked. The information that the 
frequency of .V-terminal blockage is strongly related to 
the nature of the A'-terminal croup will be of some help 
in connection with p/ predictions based on database 
information. However, without information from other 
sources, an uncertainty will always remain as to whether 
the A'-terminal charge should be included in the p/ calcu- 
lation. 



4 Concluding remarks 

The data presented here lays the foundation for com- 
paring 2-D gel protein maps of different cell types gener- 
ated with nonlinear, wide-range IPGs in the first dimen- 
sion. The focusing positions of 41 polypeptides common 
to most human cell types have been described in a pH 
scale that allows focusing positions to be predicted with 
a high degree of accuracy, provided that the composition 
of the polypeptides are known and that information on 
posttranslational modifications are available. For poly- 
peptides with a very high buffer capacity, the limiting 
factor is the precision with which experimental pH 
values can be determined rather than the precision of 
the calculations. Possible deficiencies in the pH scale 
description of the variation of the hydrogen ion activity 
has. at least at the present state, no consequences for its 
practical use. The major limitation in connection with 
predictions of focusing positions from polypeptide com- 
positions is the quality of existing data on protein com- 
positions, especially concerning posttranslational modifi- 
cations. Amino acid sequences have been reasonably 
easy to obtain, while posttranslational modifications 
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have been difficult and work-intensive to determine. 
Recent developments in the field of mass spectrometry 
are fast changing this situation and wiihin the next years 
we can expect a surge m reliable data in this area. While 
awaiting this development, verification of correctness 
and completeness of available information on polypep- 
tide composition can be provided by experimental p/ 
values in a pH scale based on the p/ values determined 
in this study. So far. our data cover the pH range below 
pH = 7.5. The basic pH range covered by NEPHGE as 
first dimension will be covered in forthcoming work. 

Received December 29. 1995 
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